lifeiteng

Cross-platform VAD & Audio Event Detection toolkit — Python (PyPI) + TypeScript (npm) + C API. DFSMN models ~2MB, 200x real-time. Runs everywhere: native, browser (WASM), Node.js.

11
0
100% credibility
Found May 13, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

OmniVAD-Kit is a cross-platform toolkit for voice activity detection and audio event detection, providing lightweight models that run in Python, JavaScript, and native applications.

How It Works

1
🔍 Discover OmniVAD

You hear about a simple tool that finds speech, singing, or music moments in your audio recordings.

2
📦 Get the tool

Download it easily for your computer or web project with a quick install command.

3
Pick your way
🐍
Desktop app

Run it on audio files right from your command line.

🌐
Web or app

Add it to your website or mobile app for live audio.

4
🎵 Feed your audio

Upload or play your recording, and it listens carefully.

5
See the magic

In seconds, it highlights exact start and end times of speech or music.

Perfect results

You now have clean clips ready for editing, transcribing, or sharing.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is OmniVAD-Kit?

OmniVAD-Kit is a cross-platform audio detection toolkit for voice activity detection (VAD) and audio event detection (AED), spotting speech segments, real-time frame-by-frame speech probabilities, or classifying speech/singing/music. It delivers tiny 2MB DFSMN models that run 200x real-time via Python PyPI package, TypeScript npm module, and C API—working everywhere from native apps to browser WASM and Node.js. Users get CLI tools for quick audio-to-TextGrid/JSON/SRT/VTT output plus chunking for hour-long files.

Why is it gaining traction?

This cross-platform GitHub kit stands out by bundling SOTA models into a single API that deploys natively, in browsers, or Node.js without dependencies, crushing latency at 200x speeds on M-series chips. Developers love the thread-safe cloning for multi-stream apps and Whisper-ready chunking modes (greedy or longest-gap) that prep audio for ASR pipelines. Zero-install PyPI/npm hits make prototyping audio features instant.

Who should use this?

Backend devs building transcription services or voice pipelines needing fast VAD before Whisper. Frontend engineers adding real-time speech detection to browser apps or cross-platform music players on GitHub. Node.js creators handling live streams/captioning, or C++ teams embedding lightweight AED in mobile/desktop audio tools.

Verdict

Grab it for cross-platform audio detection if you need tiny models and everywhere deployment—docs and tests are thorough despite 11 stars and 1.0% credibility score. Still early; watch for broader adoption before production lock-in.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.