RecursiveIntell

Rust implementation of TurboQuant, PolarQuant, and QJL — zero-overhead vector quantization for semantic search and KV cache compression (ICLR 2026)

10
2
69% credibility
Found Mar 29, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

Rust library for compressing high-dimensional vectors used in semantic search and AI model key-value caches with zero training required.

How It Works

1
🔍 Discover turbo-quant

You hear about a clever tool that shrinks huge data lists for AI searches without losing any smarts, saving memory and speed.

2
📦 Add to your project

You easily bring the tool into your creation space with one quick step, no hassle.

3
⚙️ Set your preferences

You choose simple settings like your data's size, detail level, and a lucky number to make it yours.

4
Compress your data

You feed in your big lists of numbers, and they magically turn tiny while staying just as useful – feels like a miracle!

5
🔍 Run fast searches

Now you quickly compare new data to your shrunken collection and get spot-on matches every time.

6
📉 Enjoy the savings

Your app uses way less space and runs smoother, handling bigger loads effortlessly.

🎉 AI supercharged

Your search tool or AI brain is now lightning-fast, cheap to run, and ready for anything – high five!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is turbo-quant?

turbo-quant is a Rust GitHub crate delivering TurboQuant, PolarQuant, and QJL—zero-overhead vector quantization for compressing embeddings and KV cache entries to 3-8 bits per dimension. It solves the pain of traditional methods like Product Quantization, which demand dataset training and fail on streaming data, by enabling instant compression with no calibration or codebooks. Developers get deterministic quantizers from four params (dim, bits, projections, seed), plus direct inner product/L2 estimates and a KV cache compressor for transformer attention—all via simple Cargo.toml git dependency.

Why is it gaining traction?

Unlike PQ or OPQ, it's fully data-oblivious and deterministic, shipping no model files—just recreate from seed in rust github workflows or CI with rust github actions cache. The hook is provably unbiased estimates at low bits, 1.2-1.3x compression on 1536-dim vectors, and KV cache ops like compress_token and attention_scores without decompression. As a rust github trending candidate and rust implementation of turbo quanto, it fits rust github api/ci pipelines seamlessly.

Who should use this?

ML engineers indexing embeddings for semantic search in vector DBs, especially streaming or privacy-sensitive setups. LLM inference devs optimizing long-context transformers via KV cache compression at 4-6 bits per head. Rust devs building rust github client tools or rust crates needing lightweight similarity search.

Verdict

Grab it for prototypes if you're in semantic search or LLM serving—excellent docs, tests, and serde support make evaluation fast, despite 10 stars and 0.7% credibility signaling early maturity. Pair with rust github dependency management for quick rust github actions integration, but watch for production scaling on large dims.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.