saadansha

Probe and compare the prosody (pitch / energy / duration) of TTS outputs.

87
0
89% credibility
Found May 26, 2026 at 87 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This is a small, open-source tool that helps people analyze text-to-speech audio outputs. It measures three key qualities of spoken voice: pitch (how high or low the voice goes), energy (how loud or quiet it is), and duration (how long the speech lasts). The tool can compare two voice recordings and tell you how similar or different they are, or create visual charts of a single recording's voice patterns. It's designed for developers and researchers building or testing speech synthesis systems who want objective measurements of whether their voices sound natural and human-like.

How It Works

1
🎙️ You have a text-to-speech system

You've built or are using a voice generator, but you're not sure if it sounds natural and human-like.

2
🔍 You want to measure how natural it sounds

Beyond just listening, you want numbers that tell you whether the pitch, energy, and timing feel right.

3
📦 You install the prosody probe tool

With one simple command, you get a set of tools that can analyze the sound of speech from your recordings.

4
You choose how to explore your audio
⚖️
Compare two audio files

Drop in two recordings and get a detailed report showing how similar or different they are.

📊
Visualize a single recording

Generate a chart showing the pitch and energy patterns of one voice output.

5
📈 You receive clear measurements

The tool gives you easy-to-understand numbers: pitch accuracy, energy match, and how the lengths compare.

You know exactly how natural your voice sounds

Now you have real data to guide improvements and prove whether your changes make the speech sound better.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 87 to 87 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is tts-prosody-probe?

This is a Python tool that extracts and compares prosody features from TTS audio outputs. Prosody is what makes speech sound natural: pitch (how high/low the voice goes), energy (loudness over time), and duration (how long things take to say). If you've ever wondered whether your new TTS model actually sounds better or just different, this tool gives you the numbers. You get a CLI with commands to extract features, compare two audio files, or generate visualizations. The library exports simple functions like `pitch_contour()` and `compare_pair()` that return metrics like pitch RMSE, pitch correlation, energy RMSE, and duration ratio.

Why is it gaining traction?

Most TTS evaluation focuses on spectrogram quality or intelligibility scores, but prosody is a different beast. It answers: does this voice sound like a human speaking naturally? The tool uses proven signal processing under the hood (librosa for pitch tracking via pyin, RMS energy extraction) but wraps it in a dead-simple interface. The `--metric` flag on the compare command is particularly useful for CI pipelines or A/B testing different model versions. No config files, no training required, just pass two audio files and get numbers.

Who should use this?

TTS researchers and engineers evaluating whether model changes actually improve naturalness. If you're comparing two voice models or fine-tuning a system and need objective metrics beyond subjective listening tests. Integration into CI/CD pipelines for voice products where prosody drift matters. It's probably overkill for end users or anyone who just wants TTS that works out of the box.

Verdict

A focused, useful tool for a specific niche with a credibility score of 0.8999999761581421% and 87 stars. The small community means limited examples and test coverage, but the core functionality is solid and the API is clean. Worth trying if you're doing any serious TTS work, but expect to do some experimentation to interpret the metrics for your specific use case.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.