Saganaki22

VoxCPM2 TTS for ComfyUI. 30 languages, voice design, controllable cloning, 48kHz audio, and LoRA training

47
4
100% credibility
Found Apr 12, 2026 at 47 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Custom nodes for the ComfyUI app that let users generate multilingual speech from text, design new voices, clone from audio samples, and train voice styles.

How It Works

1
🔍 Discover Voice Magic

You spot this fun voice creator tool while browsing add-ons for your drawing app ComfyUI.

2
📦 Add It Easily

Click install in the manager, and it joins your toolbox without any hassle.

3
🔄 Refresh and Explore

Restart your app, and new voice blocks appear ready to use under the sound section.

4
🎤 Speak Your Words

Drag a voice block, type your message, describe the perfect voice like 'cheerful young woman', and hit go.

5
Choose Your Style
Invent Voices

Dream up new voices just from words, no sample needed.

🔄
Copy Real Voices

Upload a voice sample and matching words for spot-on clones.

6
🔊 Hear the Magic

Listen as your text turns into lifelike speech in the voice you chose, crystal clear.

🎉 Voices at Your Fingertips

Now craft speeches, clones, or even train custom styles anytime, bringing stories to life effortlessly.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 47 to 47 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ComfyUI-VoxCPM2?

ComfyUI-VoxCPM2 brings VoxCPM2 TTS to ComfyUI workflows in Python, letting you generate 48kHz studio-quality audio from text across 30 languages. It solves the hassle of stitching external TTS tools into node-based pipelines by adding dedicated nodes for zero-shot TTS, voice design from natural descriptions, and controllable voice cloning from short clips. Users get seamless integration with auto model downloads, LoRA loading, and even full LoRA training pipelines right in ComfyUI.

Why is it gaining traction?

It stands out with controllable cloning—clone a voice from reference audio alone, or boost fidelity with transcripts—plus voice design like "young woman, gentle tone" without samples. The 48kHz output, optional ASR auto-transcription, reference denoiser, and torch.compile speedups make generations fast and clean, all without leaving ComfyUI. LoRA training nodes turn it into a full voice customization suite, hooking devs who want end-to-end TTS control.

Who should use this?

ComfyUI users building AI audio pipelines, like generative art creators adding narrated outputs or podcasters prototyping voiceovers. Voice AI experimenters cloning accents in 30 languages or fine-tuning LoRAs on custom datasets. Skip if you're not in ComfyUI—it's node-specific.

Verdict

Solid pick for ComfyUI TTS needs; install via Manager and start cloning voices today. At 47 stars and 1.0% credibility, it's early but well-documented with clear nodes and guides—watch for community growth before production.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.