pulgog

pulgog / whisperkv

Public

KV-cache compression for Whisper-family speech models. Drop-in patch, three eviction policies.

47
0
85% credibility
GitGems finds repos before they trend -- Star growth, AI reviews, and architecture deep-dives -- free with GitHub.
Sign Up Free
AI Analysis
Python
AI Summary

WhisperKV is a lightweight tool that reduces the memory needed by speech recognition models during long audio transcription, allowing users to process extended recordings on smaller computers without retraining the model.

How It Works

1
🎤 You're transcribing long audio

You need to convert hours of speech into text, but your computer runs out of memory partway through.

2
💡 You discover WhisperKV

A tiny helper that makes your speech recognition use much less memory without losing accuracy.

3
📦 You install it with one command

A simple one-line installation adds the memory-saving feature to your existing setup.

4
🔧 You wrap your speech model

You tell WhisperKV how much memory to save by choosing a simple setting like 'keep the last 64 words plus the 32 most important ones'.

5
▶️ You run your transcription

Your audio plays through the model, which now intelligently discards less useful memory while keeping what matters for accuracy.

You get your transcript

Your full audio is transcribed successfully using a fraction of the memory, even on a smaller computer.

Sign up to see the full architecture

4 more

Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is whisperkv?

WhisperKV is a Python library that compresses the key-value cache in Whisper speech models during inference. It uses eviction policies to keep only the most important cache entries, dramatically reducing memory usage on long audio transcription. You drop it into existing code with a single function call.

Why is it gaining traction?

The hook is simple: run Whisper on a smaller GPU without retraining. Their benchmarks show 3-4x memory reduction with minimal word error rate impact. The heavy-hitter policy tracks accumulated attention weights to intelligently prune the cache. Three policies are available for different tradeoffs between speed and accuracy.

Who should use this?

- Developers building real-time ASR on budget hardware - Teams running Whisper in streaming mode - Anyone transcribing long audio files on GPUs with limited VRAM

Verdict

At 47 stars, this is early-stage but the implementation is clean and the benchmarks are transparent. The 0.85% credibility score reflects the project's newness, but if you need to shrink Whisper's memory footprint without retraining, this is worth a look.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.