kyutai-labs

MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sacrificing real-time interactivity.

44
3
100% credibility
Found May 01, 2026 at 44 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

MoshiRAG is a real-time voice AI that chats fluidly while fetching external facts asynchronously to boost accuracy without interrupting the flow.

How It Works

1
💬 Discover MoshiRAG

You hear about a friendly AI that chats naturally while looking up facts to stay accurate.

2
🌐 Visit the demo page

Open the web page to start a real-time voice conversation with the AI.

3
🎤 Allow your microphone

Give permission for the AI to hear you speak, just like a video call.

4
🗣️ Start talking

Speak freely about anything, and the AI responds right away without pauses.

5
Watch it fetch facts

When needed, it quietly grabs helpful info in the background to give spot-on answers.

6
📚 See the references

Check the side panel for the facts it found, keeping everything trustworthy.

😊 Enjoy smart chats

Finish with accurate, flowing conversations that feel completely natural.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 44 to 44 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is moshi-rag?

MoshiRAG (aka moshi rag or moshirag) is a compact full-duplex speech-to-speech AI model that runs real-time voice conversations while asynchronously fetching external knowledge to boost factuality. Built on the Moshi voice model, it predicts when to trigger RAG retrieval—like for factual queries—injects references mid-response without pausing the chat, and delivers natural interactivity. Developers get PyTorch for experiments, Rust for production servers, plus a web UI demo with visualizations, transcripts, and retrieval panels; deploy via Docker Compose or Swarm.

Why is it gaining traction?

It stands out by marrying full-duplex voice (listen/speak simultaneously) with seamless asynchronous RAG, improving answer accuracy without latency spikes that plague other voice AIs. Users notice grounded responses in live chats, like fact-checked trivia during casual talk, via low-latency local LLMs (e.g., vLLM) or APIs. The web client shines with audio stats, pipe animations for retrieval flow, and exportable transcripts/videos—hooks for quick demos.

Who should use this?

Voice AI builders needing factual grounding in real-time apps, like interactive tutors or assistants handling queries beyond training data. Researchers tweaking speech models (e.g., Moshi variants) for duplex setups with RAG. Devs prototyping multilingual voice agents, especially those eyeing Rust efficiency or Hugging Face model integration.

Verdict

Promising for voice+RAG experiments, with solid docs, models on HF (CC-BY-4.0), and a runnable demo—but at 44 stars and 1.0% credibility, it's early-stage; expect tweaks for prod stability. Try the web UI if full-duplex factuality intrigues you.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.