Deploy to Thinkube

Enter your Thinkube instance domain to deploy this template

vLLM Inference Server

High-performance text generation with vLLM engine

vLLM Engine Gradio UI RTX 3090+ Only PagedAttention
⚠️ GPU Required This template requires an Ampere+ GPU (RTX 3090, RTX 4090, A100, etc.). Will NOT work on GTX 1080 or RTX 2080.
Enter your domain without https:// or /control
← Back to template repository