xcena-dev

xcena-dev / maru

Public

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

38
4
100% credibility
Found Mar 09, 2026 at 26 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Maru provides a high-performance key-value storage system using CXL shared memory to enable low-latency KV cache sharing for multiple AI model instances.

How It Works

1
📰 Discover Maru

You hear about Maru, a clever way to make AI conversations zoom by letting multiple computers share the same memory instantly.

2
💻 Prepare your setup

You grab a modern Ubuntu machine with fast shared memory hardware, just like setting up a new toolbox.

3
🛠️ Run the easy installer

You follow simple steps to install everything with one helpful script, like plugging in a new gadget.

4
🚀 Start the sharing service

With a quick command, you launch the background helper that creates a big shared memory playground for your AIs.

5
🤖 Link your AI program

You add a few friendly lines to your AI code to connect it to the shared playground.

6
📤 Share AI thoughts

Your AI saves its thinking notes right into the shared space, and others grab them without delay.

Lightning-fast AI chats

Now your AIs work together seamlessly, delivering super quick responses like magic!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 26 to 38 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is maru?

Maru is a Python-based KV cache storage engine that lets multiple LLM inference instances share KV cache directly in CXL shared memory, skipping network copies entirely. You get zero-copy reads/writes via memoryviews, with only lightweight metadata sent over RPC—perfect for long contexts where traditional sharing chokes on data transfer. It plugs into LMCache as a drop-in remote backend using a `maru://` URL, backed by a C++ high performance daemon for the shared memory pool.

Why is it gaining traction?

Unlike network-based KV sharing that scales poorly with context length and concurrency, Maru bounds performance to CXL bandwidth, boosting hardware utilization and cutting energy via no-copy data paths. Developers dig the simple API—alloc a page, write directly, register with a key—and batch ops for high throughput, plus benchmarks showing async RPC pipelining. It's a high performance cache that echoes RRIP replacement smarts but for CXL, standing out in high performance backend GitHub projects.

Who should use this?

LLM serving engineers on CXL hardware running multi-GPU inference workloads, especially with LMCache for P2P cache sharing or disaggregated prefill. Teams battling KV duplication in vLLM or similar, needing a high performance distributed cache without Redis/Memcached overhead. Python high performance GitHub users eyeing C++ cores for low-latency backend services.

Verdict

Try it if you're on CXL and scaling LLM inference—solid docs, quickstart, and LMCache integration make evaluation easy despite alpha status and 26 stars. 1.0% credibility score flags early maturity (light tests, no wide adoption), but promising for niche high performance computing GitHub setups; watch for production hardening.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.