0xSero / turboquant
PublicTurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
TurboQuant compresses the memory used by AI models to store conversation history, enabling roughly double the context length on existing hardware.
How It Works
You discover a clever way to make AI chatbots remember much longer conversations without needing extra computer power.
You easily include this memory-saving tool into your existing AI conversation software.
You test it side-by-side with your normal setup to see the difference.
Watch as your AI now handles twice the conversation history, freeing up space for even bigger chats.
Your AI remembers way more of what you said before, making responses smarter and more connected.
You now enjoy endless, detailed discussions with your AI on the same computer, feeling the power boost!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.