Saganaki22

ComfyUI custom nodes for LongCat-AudioDiT \ Diffusion-based Zero-Shot Text-to-Speech

45
3
100% credibility
Found Apr 02, 2026 at 49 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Custom nodes for the ComfyUI interface that provide zero-shot text-to-speech synthesis, voice cloning from reference audio, and multi-speaker dialogue generation using diffusion-based audio models.

How It Works

1
🔍 Discover the voice magic

You find this fun add-on in your ComfyUI toolbox that turns words into realistic speech and copies voices perfectly.

2
📦 Add it with one click

Open the manager, search for it, and install – everything sets up automatically without any hassle.

3
🧩 Build your first voice

Drag the text-to-speech piece into your canvas, type a message, and connect it up.

4
🎤 Clone a voice instantly

Upload a short audio clip of someone talking, add your new words, and watch it recreate their voice like magic.

5
🗣️ Create lively chats

Add more voices for different people, tag their lines in your script, and build a full conversation.

6
▶️ Hit generate and listen

Press play, wait a moment, and hear your custom audio come to life right in the player.

Your voices are ready

Download the crystal-clear speech or conversation, perfect for videos, stories, or fun projects – it sounds just like real people talking!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 49 to 45 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ComfyUI-LongCat-AudioDIT-TTS?

This Python project delivers ComfyUI custom nodes for LongCat-AudioDiT, enabling zero-shot text-to-speech with diffusion-based generation at 24kHz broadcast quality. Users get three nodes: basic TTS from text, voice cloning from 3-15s reference audio, and multi-speaker conversations via [speaker_N]: tags in text. It solves integrating high-fidelity TTS into ComfyUI workflows by auto-downloading models to a comfyui custom model path folder and handling dependencies on startup.

Why is it gaining traction?

Unlike standalone TTS tools, it plugs directly into ComfyUI's node graph for seamless chaining with image/video gen, supporting FP8/BF16 models down to 8GB VRAM and attention backends like SageAttention for speed. Install via comfyui custom nodes manager or git clone to the comfyui custom nodes folder, with smart caching and CPU offload to avoid OOM crashes. Early adopters praise the progress bars, interruption support, and zero-shot cloning that rivals fine-tuned models without training.

Who should use this?

ComfyUI power users building AI media pipelines need voiced narrations or dialogues without leaving the canvas. AI artists prototyping audio-reactive visuals or multi-speaker storyboards will find the node inputs intuitive. Devs experimenting with audiodit in comfyui custom workflows can skip manual comfyui github install hassles.

Verdict

Grab it if you're in ComfyUI—solid docs and auto-setup make it plug-and-play despite 45 stars and 1.0% credibility score signaling early maturity. Test on short clips first; lacks extensive examples but shines for quick voice prototypes.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.