OnlyTerp / turboquant
PublicFirst open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache compression for LLM inference. 5x compression with near-zero quality loss.
TurboQuant is an open-source tool that compresses the memory footprint of large language model key-value caches by 5-7 times with minimal accuracy loss, enabling longer contexts and more efficient serving.
How It Works
You hear about TurboQuant, a clever way to make AI chatbots use way less memory while staying just as smart.
Download it from the sharing site with a simple copy-paste command that sets everything up.
Run a sample to see your AI squeeze its memory needs by 5 times right away.
Watch in amazement as your AI handles super long conversations without running out of space or slowing down.
Connect it to your favorite chatbot model and test how it performs.
Put your speedy AI online so friends or users can chat with it anytime.
Now your chatbot runs longer, serves more people, and costs less—perfect for big ideas!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.