nickleodoen

A Distributed Semantic Cache Service for LLM applications - Multi-node, MCP-compatible, Written in Rust!

17
0
100% credibility
Found May 07, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

FerroCache is a shared service that caches AI responses for semantically similar questions across applications to reduce costs and latency.

How It Works

1
🔍 Discover FerroCache

You hear about a smart cache that remembers answers to similar questions for your AI chat app, saving time and money on repeated asks.

2
🚀 Launch the cache

Start the cache service on your computer with a simple download and run, like turning on a helper app.

3
🔗 Connect your AI

Link it to your AI service like OpenAI in a few lines of code, so it checks for saved answers first.

4
Ask and get instant replies

Type a question – if it's similar to one before, you get the saved answer super fast without waiting for the AI.

5
📈 Watch it grow

As you use it more, it learns your common questions and speeds up your whole app.

6
Need more power?
Keep it simple

Your single setup handles everything perfectly.

Grow the team

Add helpers that work together seamlessly.

🎉 Faster, cheaper AI

Your app responds lightning-quick to familiar questions, cutting costs and delighting users.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 17 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ferrocache?

FerroCache is a distributed semantic cache service for LLM applications, written in Rust as a standalone HTTP API that any language can hit. It sits between your app and expensive LLM providers like OpenAI or Anthropic, embedding queries and returning cached responses for semantically similar ones via cosine similarity search—saving tokens and latency on repeats. Deploy once as a multi-node cluster with replication and gossip-based discovery, and it shares the cache fleet-wide while surviving restarts via write-ahead logging.

Why is it gaining traction?

Unlike in-process libraries like GPTCache, it's a shared service like Redis but with HNSW-powered semantic search, exact-match pre-filtering under 0.4ms, tenant isolation, conversation scoping, and TTL/LRU eviction. Drop-in wrappers for OpenAI/Anthropic SDKs and LangChain/LlamaIndex backends make integration one-liner simple, plus Prometheus metrics and Grafana dashboards for production ops. Benchmarks show 2,600+ inserts/sec with WAL fsync, beating GPTCache on durability and scale for distributed semantic systems.

Who should use this?

Backend engineers building LLM-powered SaaS (chatbots, RAG pipelines) needing a shared cache across microservices or tenants. Teams on LangChain/LlamaIndex wanting semantic caching without per-process silos, or ops folks running distributed AI workloads who hate recomputing similar queries in customer support or code assistants.

Verdict

Solid for early adopters prototyping distributed semantic caches—excellent docs, Docker Compose clusters, and Python clients—but at 15 stars and 1.0% credibility, it's pre-1.0 maturity; test in staging before prod. Worth a spin if you're scaling LLM costs today.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.