lifeiteng / OmniVAD-Kit

Public

Cross-platform VAD & Audio Event Detection toolkit — Python (PyPI) + TypeScript (npm) + C API. DFSMN models ~2MB, 200x real-time. Runs everywhere: native, browser (WASM), Node.js.

100% credibility

Found May 13, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

OmniVAD-Kit is a cross-platform toolkit for voice activity detection and audio event detection, providing lightweight models that run in Python, JavaScript, and native applications.

How It Works

🔍 Discover OmniVAD

You hear about a simple tool that finds speech, singing, or music moments in your audio recordings.

📦 Get the tool

Download it easily for your computer or web project with a quick install command.

Pick your way

🐍

Desktop app

Run it on audio files right from your command line.

🌐

Web or app

Add it to your website or mobile app for live audio.

🎵 Feed your audio

Upload or play your recording, and it listens carefully.

⚡ See the magic

In seconds, it highlights exact start and end times of speech or music.

✅ Perfect results

You now have clean clips ready for editing, transcribing, or sharing.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is OmniVAD-Kit?

OmniVAD-Kit is a cross-platform audio detection toolkit for voice activity detection (VAD) and audio event detection (AED), spotting speech segments, real-time frame-by-frame speech probabilities, or classifying speech/singing/music. It delivers tiny 2MB DFSMN models that run 200x real-time via Python PyPI package, TypeScript npm module, and C API—working everywhere from native apps to browser WASM and Node.js. Users get CLI tools for quick audio-to-TextGrid/JSON/SRT/VTT output plus chunking for hour-long files.

Why is it gaining traction?

This cross-platform GitHub kit stands out by bundling SOTA models into a single API that deploys natively, in browsers, or Node.js without dependencies, crushing latency at 200x speeds on M-series chips. Developers love the thread-safe cloning for multi-stream apps and Whisper-ready chunking modes (greedy or longest-gap) that prep audio for ASR pipelines. Zero-install PyPI/npm hits make prototyping audio features instant.

Who should use this?

Backend devs building transcription services or voice pipelines needing fast VAD before Whisper. Frontend engineers adding real-time speech detection to browser apps or cross-platform music players on GitHub. Node.js creators handling live streams/captioning, or C++ teams embedding lightweight AED in mobile/desktop audio tools.

Verdict

Grab it for cross-platform audio detection if you need tiny models and everywhere deployment—docs and tests are thorough despite 11 stars and 1.0% credibility score. Still early; watch for broader adoption before production lock-in.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

558

Followers

Base stars: 11 stars

Bonus: AI verified quality (100%)

Account age: 4,889 days

Repo age: 4 days

License: Apache-2.0

Updated: May 13, 2026