mrvellang / vlm-probe-suite
PublicFineVLM-Probe: a lightweight harness for fine-grained probing of frozen vision-language models (CLIP / SigLIP / BLIP-2 / LLaVA).
FineVLM-Probe is a research toolkit that tests how well artificial intelligence models understand images alongside text. It asks probing questions like whether an AI notices the difference between a 'red cube on blue ball' versus a 'blue cube on red ball,' or counts objects accurately. The tool lets researchers compare different AI systems (such as CLIP, SigLIP, BLIP-2, and LLaVA) on these fine-grained visual tasks, producing clear performance tables. It includes specialized tests for multilingual understanding using Cantonese captions and helps scientists reproduce and verify published research results.
How It Works
A researcher or curious person learns about FineVLM-Probe, a tool that tests how well AI models understand images and text together.
You download and set up the software on your computer so it's ready to run experiments.
You grab the collection of test images that will be shown to the AI models during testing.
You choose which vision AI to test—perhaps comparing how a smaller model stacks up against a larger one.
The tool shows the AI an image and asks tricky questions like 'Is the red object on the left or right?' to check its attention to detail.
Numbers and charts show you exactly how well the AI performed on each test, revealing where it excels and where it struggles.
You successfully ran the same tests from an academic paper, confirming the results and contributing to open science.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.