Saganaki22

ComfyUI custom nodes for Fish Audio S2-Pro TTS — voice clone, multi-speaker, and text-to-speech

34
7
100% credibility
Found Mar 13, 2026 at 33 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Custom ComfyUI nodes for Fish Audio S2 Pro text-to-speech with zero-shot voice cloning, emotional tags, multi-speaker conversations, and 83-language support.

How It Works

1
🔍 Discover amazing voices

You hear about a fun way to add lifelike talking voices to your AI image creations in ComfyUI.

2
📦 Easy one-click setup

Click install in your ComfyUI tools and restart to unlock new voice magic nodes.

3
🎙️ Type words, hear speech

Drag a voice node, type your message with fun emotions like [excited] or [whisper], and press play.

4
Choose your style
👤
Copy a voice

Upload 10-second audio of anyone to make them speak your words.

👥
Group chat

Mix multiple voices for lively conversations.

5
⚙️ Tweak emotions

Add tags like [laugh] or [sad] to make speech feel real and expressive.

Perfect audio magic

Your images now come alive with natural, emotional voices in any language—pure joy!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 33 to 34 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ComfyUI-FishAudioS2?

This Python-based ComfyUI custom nodes pack brings Fish Audio S2-Pro TTS to your workflows, letting you generate speech from text with zero-shot voice cloning from 10-30 second clips, multi-speaker conversations, and inline emotive tags like [laugh] or [whisper]. It solves the pain of stitching external TTS tools into ComfyUI by providing native AUDIO outputs, progress bars, and auto model downloads to your comfyui custom model path. Drop it in via comfyui custom nodes manager for seamless audio generation across 83 languages.

Why is it gaining traction?

Unlike basic TTS nodes, it delivers S2-Pro's top-tier naturalness with 1500+ free-form emotion controls and single-pass multi-speaker synthesis, skipping phoneme hacks or preprocessing. Developers grab it for the quantized models (down to 8GB VRAM via GPTQ) and optimizations like SageAttention, plus easy handling of comfyui custom nodes conflict or import failed errors through bundled sources. Comfyui custom workflows light up with realistic voiceovers without leaving the UI.

Who should use this?

ComfyUI power users building AI video pipelines needing dynamic narration, like chaining TTS to image-to-video nodes for talking avatars. Game devs prototyping multilingual dialogue trees, or podcasters automating voice clones from samples. Ideal if you're on RTX 3090+ and tired of piping audio through external scripts in comfyui custom scripts.

Verdict

Grab it via comfyui github install or manager for pro-level TTS in ComfyUI—docs, examples on comfyui github repository, and portable support shine despite 19 stars and 1.0% credibility signaling early maturity. Test quantized workflows first; solid for audio experimentation, but watch for edge cases on AMD/Mac.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.