davanstrien / ocr-bench

Public

Per-collection OCR leaderboards using VLM-as-judge

huggingface.cospacesdavanstrienocr-bench-britannica-results-qwen35-viewer evaluation huggingface ocr

100% credibility

Found Mar 09, 2026 at 47 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

ocr-bench is an open-source toolkit that benchmarks multiple OCR models on user-provided document images using AI judging to generate tailored leaderboards.

How It Works

📚 Gather your scans

Collect a handful of sample images from your documents, like old book pages or cards, and share them in a simple online collection.

🚀 Launch the test

Start the tool to have several smart readers try extracting text from your images automatically.

⚖️ AI picks winners

A clever AI judge compares pairs of text outputs side-by-side with the original image to decide which is better.

📊 View the rankings

See a leaderboard showing which reader performs best on your specific documents, with confidence scores.

👀 Check and vote

Browse comparisons yourself, read the texts next to each other, and vote to confirm the AI's choices.

🏆 Share your insights

Publish the custom leaderboard so others can see what works best for similar documents, with your validations.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 47 to 49 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is ocr-bench?

ocr-bench is a Python tool for building per-collection OCR leaderboards on Hugging Face using VLM-as-judge. Feed it your image dataset, run popular OCR models like GLM-OCR or DeepSeek-OCR via HF Jobs, and let a vision-language model (default: Qwen 3.5) score pairwise outputs with ELO ratings and confidence intervals. Get a custom ocr benchmark leaderboard for your docs—perfect since no single model tops generic ocr benchmarks.

Why is it gaining traction?

Unlike broad ocr benchmark ai leaderboards, ocr-bench delivers rankings tuned to your collection, like historical prints where smaller models beat giants. It's Hub-native: no local GPU, results publish as datasets with live viewers via HF Spaces, and CLI commands like `ocr-bench run`, `judge`, and `view` make it dead simple. Open-source ocr bench vlm-as-judge fills a gap for real-world digitization.

Who should use this?

Digital humanities folks scanning old books or manuscripts, library digitization teams picking OCR for card catalogs, and ML engineers evaluating models on proprietary docs. If generic ocr benchmark leaderboards mislead your use case, this spits out per-collection truth.

Verdict

Grab ocr-bench for quick custom OCR evals—CLI and HF integration shine. Early POC status (45 stars, 1.0% credibility) means rough edges, but the vlm-as-judge concept nails a painful problem; fork and polish if it fits.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

320

Followers

Base stars: 49 stars

Bonus: AI verified quality (100%)

Account age: 4,180 days

Repo age: 15 days

Updated: Mar 12, 2026