InternLM / Visual-ERM

Public

Official Implementation of "Visual-ERM: Reward Modeling for Visual Equivalence"

huggingface.cointernlmVisual-ERM chart-understanding document-parsing large-language-models large-vision-language-models llm

100% credibility

Found Mar 16, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

Visual-ERM is a tool that compares original images with AI-generated visual recreations from code, providing detailed feedback on discrepancies for tasks like charts, tables, and vector graphics.

How It Works

📖 Discover Visual-ERM

You hear about a helpful tool that checks how well AI recreates pictures like charts, tables, or drawings by spotting visual differences.

🧰 Get the visual checker

Download the ready-to-use checker and its example pictures from the sharing site.

🖼️ Prepare your pictures

Gather the original picture and the one your AI tried to recreate.

🔍 Spot the differences

Feed both pictures into the checker to get a clear list of mistakes, like wrong layouts, text errors, or missing details, with tips on how bad each is.

🛠️ Fix and improve

Use the detailed feedback to tweak your AI and make better recreations.

📊 Test on examples

Run checks on a set of practice pictures to measure how good your improvements are.

🎉 Perfect visual matches

Your AI now creates spot-on recreations that look just like the originals, ready for real use.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 16 to 16 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is Visual-ERM?

Visual-ERM is a Python-based multimodal reward model that scores vision-to-code outputs—like chart-to-code, table-to-markdown, and SVG-to-code—by comparing rendered images directly in visual space, spotting fine-grained issues like layout shifts or text mismatches. It outputs structured feedback with error categories, severity, location, and descriptions, turning critiques into actionable RL signals or inference-time revisions. Grab pretrained checkpoints from Hugging Face, serve via vLLM, and plug into frameworks like veRL for training.

Why is it gaining traction?

Unlike text-only metrics or coarse embedding similarities prone to reward hacking, Visual-ERM delivers interpretable visual judgments that boost RL gains—up to +8.4 on charts for Qwen3-VL-8B. Its task-agnostic design works across visuals without custom rules, plus VC-RewardBench lets you benchmark critics easily via API scripts. As the official GitHub repository akin to YOLO official implementations or Visual Paradigm ERM tools, it ships ready models and eval pipelines.

Who should use this?

ML engineers fine-tuning VLMs for structured visual generation, like table parsers or plot recreators needing better-than-TEDS rewards. Researchers in multimodal RL experimenting with GRPO on vision tasks. Devs building apps where pixel-perfect rendering trumps semantic matches, such as dashboard exporters or icon generators.

Verdict

Early-stage with 16 stars and 1.0% credibility score—docs are solid via README and paper, but expect tweaks for production. Worth forking for vision-to-code prototypes if you're chasing visual fidelity over proxies.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

2,645

Followers

Base stars: 16 stars

Bonus: AI verified quality (100%)

Account age: 1,018 days

Repo age: 7 days

Updated: Mar 16, 2026