andimarafioti

Pure-PyTorch Parakeet TDT inference

19
5
100% credibility
Found Feb 28, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

nano-parakeet is a lightweight Python library providing fast, dependency-minimal inference for NVIDIA's Parakeet speech-to-text model using pure PyTorch.

How It Works

1
🔍 Discover nano-parakeet

You learn about a simple tool that turns audio recordings of people speaking into written text super quickly and accurately.

2
🛠️ Set up the tool

You easily add this lightweight helper to your computer so it's ready to handle speech-to-text tasks.

3
🎤 Pick your audio

You select a voice recording, like a meeting note or podcast clip, that you want to convert to text.

4
Start transcribing

You give the tool your audio file and it instantly processes the speech into readable words.

5
See the speed

Everything happens in seconds, much faster than other similar tools, giving you results right away.

Enjoy your text

You now have the complete written version of the spoken words, perfect for reading, sharing, or editing.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is nano-parakeet?

nano-parakeet delivers pure-PyTorch inference for NVIDIA's Parakeet TDT speech-to-text model in Python, ditching the official NeMo framework entirely. You get a slim package with just five dependencies—torch, numpy, soundfile, sentencepiece, huggingface-hub—that loads 1.1GB weights from Hugging Face and transcribes 16kHz mono audio files via a one-liner API or CLI command like `nano-parakeet audio.wav`. It solves NeMo's bloat: no version conflicts, no 30-second cold starts, just byte-identical transcriptions ready to drop into your project.

Why is it gaining traction?

It slashes deps from 180 to 5, cuts cold starts to 3 seconds, and boosts warm RTF up to 2.5x on RTX 4090 or 1.3x on Jetson AGX Orin, per included benchmarks you can run yourself. Supports OGG/WAV/M4A via ffmpeg, handles numpy/tensor inputs, and even offers optional timestamps for chars/words/segments. Developers love the no-fuss install (`pip install nano-parakeet`) and Jetson tweaks without rebuilding PyTorch.

Who should use this?

Python devs embedding fast STT in web apps, real-time pipelines, or serverless functions where NeMo's overhead kills deployability. Edge ML engineers on Jetson devices needing sub-100ms latency transcription. Audio tool builders wanting Hugging Face integration without framework lock-in.

Verdict

Grab it if NeMo frustrates you—benchmarks hold up, API is clean, MIT-licensed beta works out of the box on CUDA. With 18 stars and 1.0% credibility score, it's early and unproven at scale; test thoroughly before prod.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.