saadansha / tts-prosody-probe
PublicProbe and compare the prosody (pitch / energy / duration) of TTS outputs.
This is a small, open-source tool that helps people analyze text-to-speech audio outputs. It measures three key qualities of spoken voice: pitch (how high or low the voice goes), energy (how loud or quiet it is), and duration (how long the speech lasts). The tool can compare two voice recordings and tell you how similar or different they are, or create visual charts of a single recording's voice patterns. It's designed for developers and researchers building or testing speech synthesis systems who want objective measurements of whether their voices sound natural and human-like.
How It Works
You've built or are using a voice generator, but you're not sure if it sounds natural and human-like.
Beyond just listening, you want numbers that tell you whether the pitch, energy, and timing feel right.
With one simple command, you get a set of tools that can analyze the sound of speech from your recordings.
Drop in two recordings and get a detailed report showing how similar or different they are.
Generate a chart showing the pitch and energy patterns of one voice output.
The tool gives you easy-to-understand numbers: pitch accuracy, energy match, and how the lengths compare.
Now you have real data to guide improvements and prove whether your changes make the speech sound better.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.