Probing fine-grained perception in open-source vision-language models — companion code for a writeup.
VLM-Probe is a research evaluation tool from Beihang University that tests how well AI vision systems understand images. It works by showing an AI model a set of carefully designed images and asking it specific questions about what it sees—such as counting objects, identifying colors, reading signs, understanding spatial relationships, and detecting partially hidden objects. The tool then scores each AI model's performance and reveals which visual perception tasks are easy or difficult for that particular system. Researchers use this to understand the strengths and weaknesses of different AI vision models.
How It Works
You come across an academic paper about testing AI vision systems and learn that this tool can reveal exactly where AI struggles to see.
With one simple command, you set up the evaluation harness on your computer and everything is ready to go.
You pull down a collection of carefully designed images that test different aspects of visual understanding.
You choose which AI assistant you want to examine—it could be LLaVA, Qwen-VL, or another vision system.
The tool shows your AI images and asks it questions: how many objects, what colors, where things are located, and what signs say.
Look at the detailed record of every question and answer
See how different AI models stack up against each other
Create a clean webpage showing the findings
You now know exactly which visual tasks your AI handles well and where it struggles to truly see what's in an image.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.