AtharvBhat

Extremely fast VAD in rust with python bindings

17
2
100% credibility
Found Mar 10, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

A high-speed library that detects speech segments in 8kHz and 16kHz audio recordings, supporting both full-file analysis and real-time streaming in Python and Rust.

How It Works

1
🎤 Get an audio recording

You start with a sound clip full of talking, pauses, and background noise from a call or podcast.

2
📦 Add the speech finder

Easily install this speedy tool into your Python workspace so it can listen to audio.

3
⚙️ Set up the listener

Pick the audio speed (everyday talk or phone quality) and choose a detection style: gentle for whispers, balanced, or strict.

4
🔍 Scan for speech

Hand over your entire recording and in a flash, it marks every moment of actual talking.

5
📊 Review the highlights

Get a simple map showing speech versus silence, or neat lists of start and end times for talking parts.

6
🔄 Try live listening

Switch to real-time mode to detect speech as audio streams in, like during a live call.

Perfect audio ready

Now effortlessly trim silences, build voice features, or analyze talks with precise speech timings.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 17 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is fast-vad?

Fast-vad delivers voice activity detection for 8kHz or 16kHz mono audio, labeling speech versus silence per sample, per 32ms frame, or as start/end segments. Built in Rust with seamless Python bindings via pip install fast-vad or cargo add fast-vad, it handles batch processing or streaming frames in real time. It solves the need for quick, lightweight speech gating in audio pipelines without bloated ML models.

Why is it gaining traction?

Benchmarks clock it at hundreds of Gelem/s throughput on hour-long files, making it an extremely fast synonym for VAD that laps alternatives like Silero or WebRTCVAD – no more github extremely slow downloads for heavy deps. Streaming mode with reset_state() fits live mic input, and tunable modes (permissive, normal, aggressive) plus custom thresholds adapt to noisy environments. Python users love NumPy outputs for segments and features.

Who should use this?

Audio engineers in telephony or VoIP apps needing low-latency speech detection. ML devs preprocessing podcasts or call recordings before ASR. Embedded Rust hackers on edge devices where every ms counts, like voice-activated IoT.

Verdict

At 12 stars and 1.0% credibility, it's alpha-fresh with strong docs, benches, and dual-language support – maturity lags, but speed justifies a benchmark trial over slower options. Grab it if VAD is your bottleneck.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.