OmarHory / turboquant
PublicOpen-source implementation of Google's TurboQuant (ICLR 2026) — KV cache compression to 2.5–4 bits with near-zero quality loss. 3.8–5.7x memory reduction on Mistral-7B, no training required.
TurboQuant is an open-source implementation of a research technique that compresses the memory caches used by large language models, achieving 3.8–5.7x reductions with near-identical output quality.
How It Works
You hear about TurboQuant from a research blog or paper, a clever way to make AI chatbots use way less memory without messing up their smarts.
Download the files to your laptop and prepare the simple tools it needs, like setting up a quiet workspace for testing.
Start a quick check on your own computer to see how much space AI helpers save when chatting.
Your screen shows huge memory savings—like shrinking a balloon—while the AI still gives smart, correct answers.
Use your laptop for small tests—easy and free.
Borrow powerful gear online to handle giant AI brains.
Run fun tests like hiding a secret fact in a long story and see if the AI finds it perfectly.
Celebrate as your AI now fits in tiny spaces, thinks quicker, and handles long talks without forgetting—ready for real use!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.