Deploy to Thinkube
Enter your Thinkube instance domain to deploy this template
vLLM Inference Server
High-performance text generation with vLLM engine
vLLM Engine
Gradio UI
RTX 3090+ Only
PagedAttention
⚠️ GPU Required
This template requires an Ampere+ GPU (RTX 3090, RTX 4090, A100, etc.). Will NOT work on GTX 1080 or RTX 2080.
← Back to template repository