H-EmbodVis / NUMINA
Public[CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
NUMINA is a training-free add-on for the Wan2.1 text-to-video model that corrects mismatches between specified object counts in prompts and the actual numbers generated in videos.
How It Works
You learn about a clever fix that makes AI video creators show exactly the number of objects you describe, like three cats playing instead of two or four.
You download a free video-generating program that turns text descriptions into smooth animations.
You easily slip in the special tool that ensures the right number of things appear by tweaking how the AI pays attention inside.
You write a simple description of your video, like 'two kittens with two yarn balls', and note the exact counts you want.
You tell the program to create it, and it runs a quick preview to spot counts then refines to match perfectly.
You enjoy your video with the precise number of objects, feeling thrilled at the accurate, lively animation ready to share.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.