meituan-longcat / WBench
PublicWBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
WBench is a comprehensive evaluation framework created by Meituan that tests AI video generation models across 22 different metrics organized into 5 dimensions: video quality, scene consistency, instruction following, physical realism, and setting accuracy. It evaluates how well models handle interactive, multi-step video generation — like responding to camera movements, object manipulations, and perspective changes. The project includes a leaderboard ranking 20 different video models and provides detailed diagnostic reports showing exactly where each model excels or struggles. It's designed for researchers and developers who want to understand and compare video world models.
How It Works
You hear about WBench — a benchmark that evaluates how well AI video generators respond to interactive instructions, like moving through a scene or changing objects.
You get a copy of WBench on your computer. The setup checks that all the tools are working correctly so everything runs smoothly.
Your video model creates short clips based on interactive instructions — like a character walking forward, then picking up an object, then the camera switching perspective.
WBench runs 22 different tests on your videos — checking if the video looks good, if objects stay consistent, if physics makes sense, and if your instructions were followed.
Your results are broken into five categories: video quality, scene consistency, how well instructions were followed, physical realism, and setting accuracy. Each has its own score.
Your model performs well — you see strong scores across most dimensions and can share these results.
Your model has weaknesses — maybe physics isn't realistic or objects change unexpectedly. You know exactly what to work on.
You now understand your video model's strengths and weaknesses across every dimension, with clear scores and comparisons to other models.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.