hec-ovi / vllm-awq4-qwen
PublicvLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.
A setup to run a high-performance, quantized Qwen language model with speed boosts on specific AMD integrated GPUs.
How It Works
You hear about a setup that lets a powerful AI chat super quickly on special AMD computers.
See if your laptop has the right AMD graphics chip and enough memory to run it smoothly.
Make a couple simple changes in your computer's boot options to unlock full power.
Grab the smart thinking files from the internet – it feels exciting as they arrive.
Follow the guide to build and start everything with easy steps.
Type questions in a fun command tool and watch answers appear in seconds.
Generate code, describe images, or solve puzzles at blazing speeds on your own machine.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.