WavFlow is an open-source research project from Meta AI that generates synchronized, high-quality audio from video and/or text inputs. Unlike traditional approaches that work through compressed representations, WavFlow processes raw audio waveforms directly. The system uses visual features (CLIP frames), audio-visual synchronization signals (Synchformer), and text features (CLIP text encoder) as conditioning inputs, then employs flow matching — a technique that gradually transforms random noise into audio that matches the input conditions. It supports video-to-audio generation (matching sounds to visual events), text-to-audio generation (creating sounds from descriptions), and hybrid approaches. The project is available under a non-commercial license and includes complete training code, though pre-trained checkpoints are not yet released.
How It Works
You learn about WavFlow through its project page or research paper — a tool that creates audio directly from videos or text descriptions, working at the raw sound level rather than through compressed representations.
You run a simple setup script that installs everything you need, and the tool automatically downloads the helper models it requires from the internet.
You create a simple spreadsheet listing your videos or text descriptions, telling the system which ones have video and which have text captions.
The tool analyzes each video frame-by-frame and extracts visual patterns, audio-visual sync cues, and text meanings — these become the 'blueprint' for generating matching sounds.
You run the training script with your prepared data, and over time the system learns to generate audio that matches your specific videos or descriptions.
You follow the project for future updates when the team releases models trained on publicly available data.
You provide a video or description, and within moments the system creates matching sound effects — forest ambience, drum beats, sports sounds, whatever your input calls for.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.