KlingAIResearch

Try X-Dub to sync any character in a video with any audio you like | Official repository for "From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping"

96
0
100% credibility
Found Mar 19, 2026 at 96 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This repository provides the official PyTorch implementation, inference code, and pretrained weights for X-Dub (Wan-5B), a mask-free visual dubbing model introduced in the paper 'From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping'.

Star Growth

See how this repo grew from 96 to 96 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is X-Dub?

X-Dub is a Python-based visual dubbing tool from Kling AI Research that syncs any video character's lips to new audio clips, handling humans, cartoons, or animals without manual masks. Upload a video and WAV file via its CLI script, and it auto-crops faces, generates synced frames using pretrained diffusion models, then blends and pastes results back into the original footage. Built on PyTorch with Hugging Face weights, it outputs dubbed MP4s ready for use.

Why is it gaining traction?

It delivers mask-free dubbing with solid generalization to non-humans, beating traditional inpainting methods on temporal consistency despite occasional flicker. The one-command inference (tune scales for ref/audio fidelity) and public Wan-5B model make it dead simple to try x dub github locally, especially versus closed tools needing heavy setup. Early demos on the project homepage hook devs experimenting with AI video edits.

Who should use this?

Video AI researchers testing generative dubbing pipelines, content creators dubbing multilingual clips or memes, and ML engineers prototyping apps like x-dubai media tools. Ideal for single-person videos under 21GB VRAM; skip if you need multi-person support yet.

Verdict

Worth trying for cutting-edge lip-sync research, but at 96 stars and 1.0% credibility, it's early—docs are README-focused with TODOs for stability and speed. Solid starting point if you're okay tweaking params.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.