ernie-research

ernie-research / NAVA

Public

Official Code of NAVA: Native Audio-Visual Alignment for Generation.

19
0
89% credibility
Found May 28, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

NAVA is an AI system developed by Baidu's ERNIE team that generates synchronized audio and video content from text descriptions, with support for voice cloning, image animation, and audio-only generation.

How It Works

1
🎬 Discover NAVA

You hear about an AI that creates videos with perfectly synchronized sound from just a text description.

2
📦 Set up the project

You download the project and the AI model weights with simple commands - everything you need comes in one package.

3
✍️ Describe your vision

You write a simple description like 'a surfer riding a wave at sunset' or 'two people having a conversation in a coffee shop'.

4
Enhance your description

You click 'Rewrite' and watch as your simple description transforms into a detailed, cinematic prompt that brings out the AI's full potential.

5
Choose your generation mode
🎭
Voice cloning mode

Upload a short voice sample and the AI will use that voice in the generated speech

🖼️
Image animation mode

Upload a starting image and the AI will animate it into a video

🎵
Audio only mode

Generate just sound effects or speech without video

6
🚀 Generate your creation

You click Generate and wait about a minute while the AI creates your video with naturally synchronized audio.

🎉 Enjoy your creation

Your video plays with audio that perfectly matches the action - lip movements sync with speech, sound effects match the scene.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is NAVA?

NAVA is a generative AI model that produces synchronized audio and video from text prompts. Built in Python, it uses a 6.3 billion parameter transformer backbone with flow matching to jointly synthesize speech, sound effects, and video frames from a single model. The key innovation is native audio-video alignment: instead of generating video separately and stitching on audio, both modalities are conditioned together from the start, producing outputs where sound and image are naturally coupled.

Why is it gaining traction?

The standout feature is that it handles speech with timbre control, letting you upload reference audio files to clone a speaker's voice. You can bind multiple speakers to different dialogue spans in a single prompt. It also supports image-to-video (upload a first frame), text-driven camera composition, and runs at 720p in roughly a minute with 8 GPUs. The fact that it produces stereo audio natively without a separate vocoder step is a practical advantage for anyone who has dealt with post-hoc audio alignment. The project also includes a Gradio demo for interactive use and ships a prompt rewriter that expands short English descriptions into the dense Chinese-style captions the model was trained on.

Who should use this?

Researchers and developers building video generation pipelines who need synchronized audio. Teams evaluating audio-video models for creative tools or short-form content generation. Anyone working on speech synthesis with voice cloning that also wants video as part of the output. It is not a fit for single-GPU deployments or teams without infrastructure access to run distributed inference across 8 GPUs.

Verdict

NAVA shows strong benchmark results and a well-designed architecture, but with only 19 stars and a 0.8999999761581421% credibility score, it is very early-stage. The documentation is comprehensive and the Gradio demo makes it approachable, but real-world adoption has not been proven at scale. Worth watching closely as the field matures.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.