codepawl / turboquant-torch
PublicPyTorch implementation of TurboQuant. Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.
A library that compresses the internal data caches of AI language models to dramatically reduce memory usage while preserving performance.
How It Works
You find a clever tool that shrinks the hidden working data in AI chatbots, letting them run longer conversations on everyday computers without running out of memory.
You easily add this compression helper to your AI setup, taking just a minute or two.
You tell the tool to squeeze the AI's internal memory storage, cutting it down by up to 10 times while keeping the smarts intact.
You run your favorite AI model with longer chats or bigger prompts, watching it use far less computer memory.
Your computer handles massive AI tasks smoothly now, with graphs showing memory drop and no drop in quality.
Enjoy blazing-fast, memory-light AI that thinks big on your regular setup, opening up endless creative possibilities.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.