An automatically updated, open catalog of thousands of benchmarks for evaluating vision-language models, multimodal LLMs, and video understanding models, sourced from arXiv papers.
How It Works
While researching tests for AI models that understand images and videos, you stumble upon this organized collection of over 2,700 benchmarks.
You read the welcoming page with fun charts showing benchmarks grouped by topics like video understanding or medical imaging, and when they were released.
The eye-catching visuals help you quickly grasp trends, like which types of tests are most popular right now.
Grab the ready-to-use spreadsheet or data file packed with details like test names, descriptions, and paper links.
Open the search tool or your file to filter by category, such as safety checks or spatial reasoning, and pick the perfect benchmarks.
If you spot a missing test from a recent paper, simply share its details to help everyone benefit.
Now you have a fresh, reliable list to evaluate AI vision models, saving hours of hunting and ensuring you stay up-to-date.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.