vspeech / Qwen3-TTS-Train
PublicQwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.
A user-friendly toolkit for fine-tuning Qwen3-TTS models on custom audio datasets to create personalized voices supporting single/multi-speaker, multi-language, and instruction-based synthesis.
How It Works
You find Qwen3-TTS-Train and get excited to teach an AI to speak in your favorite voice, like a family member or character.
Record or collect 1-2 hours of clear, short audio clips (2-10 seconds each) of the voice, with matching text transcripts in a simple list.
Use the easy prep tool to process your audio and text into a ready-to-train format, adding speaker names or languages if needed.
Pick a mode like single voice, multi-voice, or emotional styles, then launch training - it learns from your samples on your computer.
Generate speech samples with your new voice model and tweak training if needed for even better results.
Enjoy hearing your custom voice read stories, news, or anything - it's like having a personal voice assistant that sounds just right.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.