okturro

Benchmark for LLM-based ASR n-best rescoring (ngram, neural-LM, MLM-PLL, LLM-prompt strategies).

32
0
100% credibility
Found May 25, 2026 at 32 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

ASR-Rescore-Bench is an academic benchmark repository for evaluating Large Language Model-based automatic speech recognition (ASR) rescoring strategies across multiple speech corpora.

Star Growth

See how this repo grew from 32 to 32 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is asr-rescore-bench?

This is a benchmark suite for comparing different ways to improve automatic speech recognition accuracy. When ASR systems process audio, they typically output multiple candidate transcriptions (an n-best list). This project tests six different strategies for picking the best one from that list, ranging from traditional n-gram language models to instruction-tuned LLMs like Qwen2.5-7B. The benchmark runs on five public corpora including LibriSpeech and Chinese speech data, measuring word error rate for each approach. It comes as a Python package with a command-line interface that lets you rescore n-best files and evaluate results against references.

Why is it gaining traction?

The research shows a compelling result: prompt-based rescoring with mid-sized LLMs closes the WER gap by 40-60% compared to traditional methods, but only when using larger n-best lists (n >= 10) and cleaning up repetition artifacts. This gives practitioners a clear decision framework for when LLM-based approaches make sense versus when simpler methods suffice. The project ships with pre-cached n-best data for public benchmarks, so you can reproduce results without regenerating everything from scratch. The CLI is straightforward: load n-best lists, apply a strategy, score against references.

Who should use this?

ASR engineers evaluating whether to add LLM-based rescoring to their production pipelines should look here first. Researchers comparing language model approaches for speech recognition will find a standardized evaluation framework. Teams building multilingual speech systems can use the benchmark to understand tradeoffs between n-gram, neural, and prompt-based methods across English and Chinese data.

Verdict

This is a legitimate research contribution with a clear methodology, but the 32 stars and 1.0% credibility score reflect its early stage. The documentation is solid and the approach is sound, but you should expect to do some integration work before dropping this into a production system. Worth evaluating if you're researching ASR rescoring; wait for more community validation before betting production on it.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.