tdemin16 / proactivebench

Public

Official repository of "ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models"

100% credibility

Found Mar 24, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

ProactiveBench is a benchmark with tools and datasets to evaluate if image-understanding AI models proactively request simple user actions to resolve unclear visuals like occlusions or distortions.

How It Works

🔍 Discover ProactiveBench

You stumble upon this benchmark while exploring ways to test if picture-seeing AI helpers know when to ask for better views on tricky images.

📦 Set up the tool

You easily add the benchmark kit to your computer with a quick install, getting everything ready to go.

📥 Grab test pictures

You download sets of challenging images—like blocked objects, blurry shots, or rough sketches—from a shared online collection.

🤖 Run tests on AI

You feed the tough images to your AI model and watch if it smartly suggests fixes like moving blocks or enhancing clarity before guessing.

📊 Review the results

You get simple scores showing the AI's accuracy and how often it proactively asks for help to see better.

🎉 Master AI smarts

Now you clearly see how proactive your AI is at handling real-life visual puzzles, ready to improve it further!

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is proactivebench?

ProactiveBench is a Python benchmark for testing proactiveness in multimodal large language models (MLLMs), like whether they request actions such as removing occlusions or enhancing blurry images before answering. You pip install it, grab test data from Hugging Face datasets, and run evals via CLI scripts for multiple-choice QA or open-ended generation on seven repurposed datasets covering occlusions, sketches, and corruptions. It's the official GitHub repository tied to an arXiv paper, solving the gap where MLLMs guess blindly instead of seeking help.

Why is it gaining traction?

Unlike standard VQA benchmarks, it simulates interactive environments where models can "act" proactively—measuring suggestion rates alongside accuracy—revealing that even big models rarely ask for better views. Devs like the drop-in examples for LLaVA-OneVision (swap in your model), plus training splits for fine-tuning via RL. Zero-maintenance eval loop hooks researchers chasing next-gen reasoning.

Who should use this?

MLLM researchers benchmarking proactiveness on occlusion-heavy tasks like ImageNet-C or QuickDraw. Fine-tuners targeting "hint-free" generalization in vision-language setups. Teams auditing models for real-world robustness before deployment.

Verdict

Grab it if you're deep into multimodal evals—solid docs and HF integration make it usable now, despite 19 stars and 1.0% credibility signaling early alpha stage. Expect tweaks as it matures post-paper.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 19 stars

Bonus: AI verified quality (100%)

Account age: 2,628 days

Repo age: 9 days

License: MIT

Updated: Mar 24, 2026