onur-gokyildiz-bhi / tq-kv
PublicPure Rust implementation of Google's TurboQuant (ICLR 2026) β KV cache compression for LLMs
tq-kv is a Rust library for TurboQuant KV cache compression that enables efficient local inference of large language models by drastically reducing memory usage while preserving quality.
How It Works
You're excited to run big AI chatbots at home but frustrated by high memory use, then you find tq-kv promising huge savings.
Check impressive charts showing your AI's memory shrinks 7-15 times while keeping smart replies sharp and fast.
Download the library and easy patches tailored for popular AI runners like llama.cpp.
Follow simple steps to add it to your AI setup, choosing CPU or GPU for your hardware.
Start talking to AI models immediately with less memory.
Adjust settings for top speed on your machine.
Launch your AI and watch it handle longer talks without slowing down.
Enjoy chatting with giant AI brains using everyday computer memory, faster and smoother than ever.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.