henliveira / av-curator
PublicAudio-visual data curation pipeline — scene cuts, silence trim, dedup, CLIP/Whisper filtering for messy web video.
AV-Curator is a video cleaning tool that helps researchers prepare raw video clips for machine learning by automatically removing unwanted content like silent sections, black frames, duplicate videos, watermarks, and off-topic language, producing a clean dataset with a detailed report of what was kept and removed.
How It Works
You collect all your raw video files into one folder on your computer, ready to be organized.
The tool scans your folder and automatically creates a detailed list of every video, noting how long each one is and what format it uses.
You pick a preset that matches your goal—either finding clips with clear speech for training a transcription AI, or finding clean visual clips for a video-understanding project.
The tool runs through your videos one by one, checking each against your chosen criteria and automatically removing or trimming away unwanted sections like silent parts, black screens, or duplicate content.
You see a clear visual breakdown showing exactly how many videos made it through each stage, so you understand what was kept and what was removed at every step.
Keep your trimmed videos in their current form and move straight to using them for training your AI model.
Let the tool automatically cut your videos to remove black frames, silence, and scene transitions, producing perfectly clean clips.
You now have a clean collection of video clips, perfectly suited for training your AI model without any of the messy problems that would have hurt quality.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.