mindtro / semafold

Public

Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).

mindtro.com embedding-compression kv-cache llm-inference qjl quantization

100% credibility

Found Apr 10, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

Semafold is a compression toolkit for shrinking AI embeddings, vectors, and key-value caches with precise size tracking and quality guarantees.

How It Works

🔍 Discover Semafold

You hear about a handy tool that shrinks your AI data files to save space without losing important details.

📥 Set it up quickly

You add the tool to your project in moments, no hassle needed.

📊 Load your data

You bring in your list of numbers or AI memory blocks that take up too much room.

✨ Shrink with a click

You choose a shrink level and press go—watch your data get 5-10 times smaller right away!

📈 Check the savings

You see a clear report on space saved and quality kept, so you know it's working great.

🔄 Use the smaller data

You swap it back into your AI setup, everything runs smoother and cheaper.

🎉 Mission accomplished!

Your AI project now uses way less storage, loads faster, and costs less to run.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is semafold?

Semafold compresses embeddings, retrieval vectors, and KV caches down to 10x smaller sizes using vector compression techniques that deliver lossless vector compression guarantees with exact byte accounting. Built in Python with a pure NumPy core, it runs anywhere—no GPU needed—but auto-accelerates on CUDA, MPS, or Metal via optional PyTorch or MLX installs. Developers get typed encode/decode APIs, validation evidence, and benchmarks proving real-world ratios like 10.52x on 128x1536 embeddings or 9.47x on KV tensors.

Why is it gaining traction?

It stands out with zero-config GPU acceleration and deterministic benchmarks validating distortion rates against research papers, unlike black-box vector compression algorithms in github vector database tools. The hook is measurable footprints and guarantees for vector database compression, letting teams audit storage savings without surprises—pure NumPy fallback ensures it works in CI, edge deploys, or vector github rust prototypes.

Who should use this?

AI engineers optimizing embedding stores or retrieval pipelines in github vector dev stacks, inference devs shrinking KV caches in custom LLM servers, and vector database builders needing 10x vector compression shirt-sleeve simple integration without vendor lock-in.

Verdict

Early days at 18 stars and 1.0% credibility score, but strong docs, 189 passing tests, and synthetic benchmarks make it prototype-ready for vector compression arm sleeves testing. Worth pip installing for high-dim workloads if you need audited 10x ratios now.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 18 stars

Bonus: AI verified quality (100%)

Account age: 1,895 days

Repo age: 9 days

License: Apache-2.0

Updated: Apr 09, 2026