Firmamento-Technologies / TurboQuant

Public

Near-optimal vector quantization from Google's ICLR 2026 paper — 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement

ann-search approximate-nearest-neighbor compression deep-learning embedding-compression

100% credibility

Found Mar 30, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

TurboQuant is a pure Python library implementing a research-backed method to compress high-dimensional vectors for efficient AI similarity search with high accuracy and no preprocessing.

How It Works

🔍 Discover TurboQuant

You learn about this clever tool that shrinks your AI data collections to save tons of space while keeping searches super accurate.

📦 Get it ready

You easily add the tool to your computer setup so it's ready to use in your projects.

📄 Turn data into patterns

You transform your documents, images, or other info into special number patterns that capture their meaning.

🗄️ Create your compact collection

You pack all those patterns into a tiny, smart storage space that finds matches just like the full-size version.

🔎 Find similar items

You enter a question or example, and it quickly pulls up the closest matches from your collection.

🎉 Searches fly with less space

Now your AI searches handle huge amounts of data using way less memory, staying fast and spot-on accurate.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is TurboQuant?

TurboQuant is a pure Python library that compresses embedding vectors 5x while keeping 95% recall in similarity search, acting as a drop-in FAISS replacement with zero preprocessing. It implements Google's TurboQuant algorithm from the ICLR 2026 paper, delivering near-optimal quantization via simple NumPy ops—no GPU or C++ needed. Developers pip install it, create an index with dimension and bits (like TurboQuantIndex(d=384, num_bits=6)), add vectors, and search for top-k matches instantly.

Why is it gaining traction?

Unlike FAISS product quantization, which needs minutes of k-means training, TurboQuant indexes in microseconds with no data-dependent steps, hitting 95% recall@10 at 5-8x compression. It's the only TurboQuant implementation tuned for vector search (not KV cache like vLLM or Llama.cpp forks), with provable guarantees within 2.7x of Shannon limits. Pure Python makes it dead simple for CPU prototyping, beating ScaNN on recall without TensorFlow deps.

Who should use this?

ML engineers building RAG pipelines or semantic search on embedding stores like Pinecone alternatives, needing cheap storage without retraining. Python devs in resource-constrained setups (IoT, edge) compressing MiniLM or OpenAI vectors. Teams evaluating TurboQuant paper claims for production ANN indexes under 100K items.

Verdict

Worth trying for small-to-medium vector search prototypes—solid docs, 3823 tests verify the paper, and API feels FAISS-like. But with 11 stars and 1.0% credibility, it's alpha-stage; monitor for scale before swapping FAISS in prod.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 11 stars

Penalty: New account (20d): -70%

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 20 days

Repo age: 2 days

License: Apache-2.0

Updated: Mar 30, 2026