dddraxxx

dddraxxx / Ref-Adv

Public

[ICLR 2026] Official code for "Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks"

18
0
100% credibility
Found Mar 05, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Ref-Adv is an academic benchmark and evaluation toolkit for assessing multimodal large language models' visual reasoning on referring expression tasks with distractors.

How It Works

1
🏠 Discover Ref-Adv

You come across this benchmark while reading about AI vision research, curious to test how smart AI models are at finding specific things in pictures.

2
📥 Download everything

You grab the project files, dataset, and pre-made results to your computer so you can start exploring right away.

3
🤖 Wake up your AI helper

You turn on one of the listed AI vision models on your machine, getting it ready to look at images and follow instructions.

4
🔍 Run the tests

You launch a quick evaluation, where the AI tries to locate objects in tricky images based on detailed descriptions amid look-alikes.

5
See predictions roll in

Watch as the AI outputs bounding boxes for each description, measuring how accurately it picks the right object despite distractions.

6
📊 Build your results table

You create a neat summary table showing accuracy scores for different models, including breakdowns by difficulty.

🎉 Understand AI strengths

You now have clear insights into which AI models excel at visual reasoning with complex instructions, ready to share or use in your work.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Ref-Adv?

Ref-Adv is a Python-based benchmark and eval kit for testing multimodal LLMs on referring expression comprehension, using a Hugging Face dataset of 1,142 tough cases with complex captions, negations, and distractors that kill shortcuts in standard REC tasks. You get pre-computed predictions for Qwen VLMs, plus scripts to run your own inferences via vLLM servers and generate markdown tables with Acc@0.5/0.75/0.9 metrics, parse fails, and distractor breakdowns. It probes real visual reasoning, like humans hitting 90% on subsets.

Why is it gaining traction?

This github iclr 2026 openreview standout from iclr 2026 accepted papers delivers instant repro of paper results—no training needed—and breaks down failures by negation ratio or distractor count (2-3, 4-6, >=7), spotlighting MLLM weaknesses better than generic benchmarks. The hook? Plug in any Qwen model via OpenAI API, retry parses on fails, and compare to full iclr 2026 papers leaderboard on ref-adv.github.io, amid buzz from github iclr leak and iclr 2026 reddit threads on reviewer leakage.

Who should use this?

VLM researchers benchmarking Qwen2.5/3/3.5 series before fine-tuning. Teams evaluating visual grounding for robotics or AR apps, especially with red advertisement-like distractors. ML engineers chasing iclr 2026 statistics on MLLM limits.

Verdict

Grab it if you're deep in VLMs—solid docs, HF integration, and zero-setup repro make it useful now, despite 18 stars and 1.0% credibility signaling early days. Watch for updates post-iclr 2026 workshops.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.