[ICLR 2026] Official code for "Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks"
Ref-Adv is an academic benchmark and evaluation toolkit for assessing multimodal large language models' visual reasoning on referring expression tasks with distractors.
How It Works
You come across this benchmark while reading about AI vision research, curious to test how smart AI models are at finding specific things in pictures.
You grab the project files, dataset, and pre-made results to your computer so you can start exploring right away.
You turn on one of the listed AI vision models on your machine, getting it ready to look at images and follow instructions.
You launch a quick evaluation, where the AI tries to locate objects in tricky images based on detailed descriptions amid look-alikes.
Watch as the AI outputs bounding boxes for each description, measuring how accurately it picks the right object despite distractions.
You create a neat summary table showing accuracy scores for different models, including breakdowns by difficulty.
You now have clear insights into which AI models excel at visual reasoning with complex instructions, ready to share or use in your work.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.