hyzhang24

hyzhang24 / DuplexSLA

Public

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

19
0
85% credibility
Found May 20, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
AI Summary

DuplexSLA is a research project developing a full-duplex spoken language model that can simultaneously listen to a user and generate speech responses, while also performing actions and using tools in real-time. The project is built on top of a large language model (~7 billion parameters) and introduces a three-channel system that decodes user audio, assistant speech, and structured actions together on a shared timeline. This allows for natural conversation dynamics like semantic-driven turn-taking, backchannel responses, and interleaved tool calling without interrupting the assistant's speech. The technical report (PDF) has been released, but the actual model checkpoints, inference code, and evaluation benchmarks are planned for future release on Hugging Face.

How It Works

1
πŸ” You discover DuplexSLA

You hear about a new AI that can listen and talk to you at the same time, like a real conversation partner.

2
🎧 You learn it's truly conversational

Unlike typical voice assistants that wait their turn, this one hears you while it speaks, making interactions feel natural and fluid.

3
πŸ“„ You read the research paper

You download the technical report to understand exactly how the speech, language, and action parts work together.

4
πŸ”§ You see it can use tools mid-conversation

You learn the assistant can search for information, call tools, or take actions without stopping its speech or losing track of the conversation.

5
You decide what to do next
⏰
Wait for the model

Bookmark the page and return when the model and code become publicly available on Hugging Face.

πŸ“š
Study the architecture

Explore the three-channel design, chunk timeline, and evaluation benchmarks to learn from the research approach.

πŸš€ You're ready for the future of voice AI

You now understand how full-duplex speech models with integrated tool use could transform how humans interact with AI.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is DuplexSLA?

DuplexSLA is a full-duplex spoken language model that lets AI assistants listen and respond simultaneously while also executing tools and planning actions in real-time. Instead of the traditional turn-based back-and-forth, this model processes continuous user speech, generates assistant audio, and emits action commands on a shared 160ms timeline. Built on Step-Audio-2-mini with roughly 7 billion parameters, it uses a three-channel architecture: user audio, assistant speech tokens, and a delayed-action stream for transcripts, planning text, and tool calls.

Why is it gaining traction?

The hook is native agentic behavior in voice interfaces. Most duplex models either halt speech to call tools or delegate tool execution to external cascades. DuplexSLA decodes tool calls on the same chunked timeline as audio, enabling interleaved multi-action sequences without interrupting the conversation flow. The semantic-driven turn-taking also stands out: instead of relying on external voice activity detection, the model internally decides when to pause, interrupt, or backchannel based on its own understanding. A dedicated benchmark (DuplexSLA-Bench) validates these capabilities jointly.

Who should use this?

Voice AI developers building real-time conversational agents and robot operators wanting unified speech-and-action pipelines. If you're constructing customer service bots, assistive interfaces, or embodied agents that need to speak and act simultaneously, this addresses a genuine gap. Researchers evaluating full-duplex benchmarks will also find the evaluation framework valuable once released.

Verdict

Wait. DuplexSLA tackles an important problem with a clean architecture, but the repository currently contains only a technical reportβ€”no inference code, checkpoints, or evaluation harness. With 19 stars and a 0.85% credibility score, this is research in progress, not a ready-to-deploy solution. Monitor for Hugging Face releases and arXiv publication before investing serious time.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.