vLLM fork for Tesla V100 (SM70) with AWQ 4-bit support, CUDA 12.8 build flow, and validated Qwen3.5 27B/35B deployment on multi-GPU V100.
A specialized vLLM fork enabling AWQ 4-bit quantized large language model inference on Tesla V100 GPUs.
How It Works
You have powerful but outdated Tesla V100 computers and want to run the latest AI language models on them.
Follow simple steps to get the right software environment ready on your machine.
Compile the special version that unlocks modern models for your older hardware with a few commands.
Run a quick check to confirm everything is set up correctly.
Launch the server on your V100 machines and connect them together if needed.
Send questions to modern quantized models like Qwen and get fast responses.
Your V100 computers now run cutting-edge language models efficiently, saving money and extending their life.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.