sitodowubb / spatial-vqa-bench
PublicSpatial-VQA-Bench: a focused benchmark of spatial visual reasoning for multimodal LLMs.
Spatial-VQA-Bench is a research tool that tests how well AI models understand spatial relationships in images, providing 3,200 questions across five categories (2D relations, 3D relations, rotation, occlusion, and viewpoint) along with software to run models and score their performance.
How It Works
You learn about a benchmark that measures how well AI understands where things are in pictures — like knowing if one object is behind another or to the left.
With one simple command, you set up the software on your computer so it's ready to run tests whenever you want.
Test a freely available model like LLaVA or Qwen2-VL that runs on your own computer
Test a powerful cloud-based model like GPT-4o that thinks through your images
The tool automatically shows each image to the AI, asks spatial questions, and records every answer the AI gives.
A clear report shows the AI's accuracy across different types of spatial tasks, revealing where it excels and where it struggles.
You now know exactly how good this AI is at understanding positions, rotations, and what objects look like from different angles.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.