AbdelStark / turboquant

Public

Rust implementation of Google's TurboQuant algorithm for vector quantization

llms machine-learning-algorithms quantization

100% credibility

Found Mar 28, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Rust

AI Summary

TurboQuant is a Rust library for advanced compression of AI model memory caches, offering benchmarks on fake data, real traces, and tiny full models to measure speed and quality gains.

How It Works

🔍 Discover TurboQuant

You find TurboQuant, a smart way to squeeze AI memory usage while keeping the brainpower sharp for longer chats.

📦 Add to your project

Simply drop it into your Rust setup with one line, and it's ready to help.

Pick your test

⚡

Quick fake test

Test speed on made-up data matching real AI shapes.

📊

Real traces

Use captured data from big AI models for true-to-life checks.

🤖

Full tiny AI

Run a complete small AI conversation loop to see real results.

🚀 Run and watch

Hit go, and see memory shrink while quality stays high – charts show the wins instantly.

📈 Review savings

Check reports on speed boosts, memory cuts, and how close outputs match the original.

🎉 AI flies faster

Your AI now handles way longer talks with less memory, feeling snappier and smarter.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is turboquant?

TurboQuant is a Rust crate for compressing LLM KV cache vectors using Google's TurboQuant algorithm, down to 1-8 bits per coordinate in MSE or inner-product modes. It packs quantized indices for storage efficiency and provides batch quantization, dequantization, and attention scoring APIs. Developers get CLI benchmarks on synthetic data, safetensors traces, or real ONNX-exported decoders like distilgpt2, plus Python scripts for Hugging Face model export and one-command evals reporting perplexity, latency, and compression.

Why is it gaining traction?

Unlike synthetic-only quantizers, it runs true end-to-end decoder loops with quantized cache feedback, measuring next-token quality drift on lightweight models. Rust GitHub Actions CI workflow with cache keeps builds fast; the real-model eval script spits out JSON/markdown summaries comparing exact vs quantized runs. Prod mode shines for attention scores without full reconstruction.

Who should use this?

LLM inference engineers benchmarking KV quantization in Rust backends, especially for CPU-limited serving of small models like SmolLM. Researchers tweaking bit widths for spectro turbiquant tradeoffs in custom engines. Rust GitHub crate users adding low-bit KV storage to prototypes.

Verdict

Promising alpha for research with excellent eval tooling, but 10 stars and 1.0% credibility score mean it's early—test locally via CLI before depending. Strong docs offset rust implementation maturity gaps.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

785

Followers

Base stars: 10 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 2,682 days

Repo age: 2 days

License: MIT

Updated: Mar 27, 2026