TrevorS / voxtral-mini-realtime-rs

Public

Streaming speech recognition running natively and in the browser. A pure Rust implementation of Mistral's Voxtral Mini 4B Realtime model using the Burn ML framework.

huggingface.cospacesTrevorJSvoxtral-mini-realtime asr burn mistral rust voxtral-mini-realtime

674

100% credibility

Found Feb 11, 2026 at 539 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Rust

AI Summary

A Rust implementation of Mistral's Voxtral Mini 4B Realtime speech-to-text model that runs natively via command line or client-side in browsers using WebAssembly and WebGPU.

How It Works

🔍 Discover browser speech-to-text

Find a tool that turns your voice into text right in any web browser, no servers needed.

🚀 Try the live demo

Click the demo link, speak into your mic or upload audio, and watch words appear instantly.

📥 Download the model

Grab the voice model files with one simple command so you can use it anywhere.

Pick your setup

💻

Desktop mode

Run it on your computer to transcribe files fast.

🌐

Web mode

Turn it into a web app for sharing online.

🎤 Transcribe audio

Record from microphone or drop in a sound file, and see text stream out live.

✅ Voice becomes text

Enjoy instant, accurate transcripts you can copy, edit, or share effortlessly.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 539 to 674 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is voxtral-mini-realtime-rs?

This Rust crate delivers real-time streaming speech recognition using Mistral's Voxtral Mini 4B model via the Burn ML framework. It processes 16kHz audio into text natively through a CLI binary—feed it WAV files for instant transcription—or streams entirely in the browser with WASM and WebGPU, handling mic input or uploads without servers. Developers get offline, low-latency streaming speech to text that runs on Vulkan, Metal, or browser GPUs.

Why is it gaining traction?

It crushes browser constraints: a 2.5GB Q4-quantized model loads and infers client-side, solving streaming speech recognition without cloud dependencies or github streaming server setups. The pure-Rust design ensures blazing native speed, while features like sharded GGUF loading dodge WASM limits, making it a go-to for streaming speech to text open source models in web apps. Live HuggingFace demo hooks tinkerers instantly.

Who should use this?

Rust devs building browser voice UIs, like real-time transcription widgets or streaming speech LLM integrations. Game developers needing local voice commands (think moonlight streaming github alternatives). Frontend teams prototyping edge AI for podcasts, meetings, or streaming speech encoder tools without backend hassle.

Verdict

Grab it for browser-native streaming speech to text experiments—the 473 stars and 1.0% credibility score signal early-stage promise, with solid docs and tests but pending benchmarks. Production? Wait for WER/speed data; prototypes shine now.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

674

Stars

Forks

Followers

Base stars: 674 stars

Bonus: AI verified quality (100%)

Account age: 5,198 days

Repo age: 26 days

License: Apache-2.0

Updated: Mar 02, 2026