TurboQuant llama.cpp fork with optimized turbo4 kernels for Gemma 4 D=256/512 heads — lazy K/V, batch decode, warp-cooperative write. 120 t/s with 3.8x KV compression on RTX 3090.
Performance-optimized fork of llama.cpp for running Gemma 4 26B model at high speeds with turbo4 KV compression on consumer GPUs like RTX 3090.
How It Works
You hear about a way to run powerful AI conversations super quickly on your home computer GPU.
Download the special AI brain files and quick starter tools that make everything ready.
Click once to start your personal AI chat server – it loads fast and waits for you.
Type your questions and watch the AI respond lightning-fast, even with super long talks.
Enjoy 120 words per second with huge memory savings, perfect for endless deep chats.
Now you have a blazing-fast, memory-smart AI companion right on your computer!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.