DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
DuplexSLA is a research project developing a full-duplex spoken language model that can simultaneously listen to a user and generate speech responses, while also performing actions and using tools in real-time. The project is built on top of a large language model (~7 billion parameters) and introduces a three-channel system that decodes user audio, assistant speech, and structured actions together on a shared timeline. This allows for natural conversation dynamics like semantic-driven turn-taking, backchannel responses, and interleaved tool calling without interrupting the assistant's speech. The technical report (PDF) has been released, but the actual model checkpoints, inference code, and evaluation benchmarks are planned for future release on Hugging Face.
How It Works
You hear about a new AI that can listen and talk to you at the same time, like a real conversation partner.
Unlike typical voice assistants that wait their turn, this one hears you while it speaks, making interactions feel natural and fluid.
You download the technical report to understand exactly how the speech, language, and action parts work together.
You learn the assistant can search for information, call tools, or take actions without stopping its speech or losing track of the conversation.
Bookmark the page and return when the model and code become publicly available on Hugging Face.
Explore the three-channel design, chunk timeline, and evaluation benchmarks to learn from the research approach.
You now understand how full-duplex speech models with integrated tool use could transform how humans interact with AI.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.