xiquan-li

xiquan-li / Resonate

Public

Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation

43
6
100% credibility
Found Mar 18, 2026 at 44 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Resonate is a state-of-the-art text-to-audio generator that uses online feedback from large audio language models to create high-quality sounds matching text descriptions.

How It Works

1
🔍 Discover Resonate

You hear about Resonate, a fun tool that turns everyday words into realistic sounds like barking dogs or rain on windows.

2
🎤 Try the Quick Demo

Type a simple description like 'a cat meowing softly' and instantly hear the generated audio clip play back.

3
Create Your Own Sounds

Experiment with different prompts to make custom audio, saving your favorites to a folder for later use.

4
🎛️ Fine-Tune for Perfection

Upload your own audio examples and let it learn your style, improving matches to your specific sound ideas.

5
🚀 Share with Friends

Put your personalized sound maker online so others can generate audio from text just like you do.

🎉 Sounds Come Alive

Now anyone can describe a scene and hear it vividly brought to life, just like magic from words.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 44 to 43 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Resonate?

Resonate is a Python framework for building text-to-audio generation models, offering full pipelines for pre-training, SFT, DPO, and GRPO stages. It turns text prompts into realistic audio clips up to 10 seconds, leveraging flow matching and LLM-based rewards for alignment. Fire up `python demo.py --prompt "a cat meowing with guitar"` to auto-download a pretrained model and generate WAV files instantly.

Why is it gaining traction?

It delivers SOTA results on TTA-Bench by using online GRPO with large audio LLMs as rewards, going beyond basic pre-training or SFT to handle complex, multi-event prompts via RLHF-style tuning (pre training sft rlhf). Devs dig the seamless shift from llm pre training github workflows to audio, with eval scripts computing CLAP, AES, and Qwen scores out-of-box. Resonate means precise sound matching that resonates—generation that actually fits the description.

Who should use this?

Audio ML researchers fine-tuning diffusion models for apps like sound effects libraries or voiceovers. Devs porting bert pre-training github or continual pre training github setups to multimodal audio, especially those tackling reinforcement pre training github or dpo for generation. Perfect for prototyping text-to-sound in games or podcasts.

Verdict

Grab it for audio experiments—demo and training scripts run smoothly on decent GPUs (90GB for GRPO defaults). At 43 stars and 1.0% credibility, it's early but credible for research; lacks extensive tests, so validate outputs before shipping.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.