ndl-lab

ndl-lab / ndlocr-lite

Public

NDLOCR‑Lite application repository (including source code)

751
35
100% credibility
Found Feb 24, 2026 at 261 stars 3x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

NDLOCR-Lite is a user-friendly desktop app from Japan's National Diet Library for recognizing text in scanned classical Japanese books from images or PDFs.

How It Works

1
📥 Download the app

Get the ready-to-run app for your computer from the trusted library's page and double-click to open it.

2
📁 Pick your books

Choose image files, a folder of scanned pages, or even a PDF of old Japanese texts to read.

3
📤 Choose save spot

Select a folder where your readable text files will land.

4
⚙️ Pick your formats

Decide if you want plain text, structured files, or fancy PDFs with hidden text overlay.

5
🚀 Hit OCR magic

Press the button and watch it scan, detect text blocks, and turn pictures into words.

6
👀 Preview the magic

Flip through results, zoom on previews, and see your ancient text come alive.

Enjoy your texts

Get clean, editable files ready for reading, searching, or sharing your digitized books.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 261 to 751 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ndlocr-lite?

NDLOCR-Lite is a Python-based OCR application repository, including lite code for processing Japanese document images and PDFs. Drop in scans of books or newspapers, and it detects layout elements like text blocks, figures, and tables, then recognizes vertical/horizontal text with reading order correction. Users get structured outputs: plain text, JSON with bounding boxes, XML layouts, or searchable PDFs—ideal for digitizing complex Japanese docs.

Why is it gaining traction?

It stands out with ONNX models tuned for Japanese typography (ruby, warichu, mixed vertical text), plus a cross-platform GUI for drag-and-drop processing, screen capture, and previews. No heavy dependencies beyond standard libs; CLI for batch jobs and crop-OCR for quick tests. Devs dig the multi-format exports and cascade recognition that boosts accuracy on short/long lines without setup hassle.

Who should use this?

Document AI builders handling Japanese archives, researchers parsing historical newspapers, or librarians batch-converting scans to TEI XML. Perfect for devs prototyping RAG pipelines needing layout-aware OCR, or anyone tired of PaddleOCR/Tesseract failing on vertical text.

Verdict

Grab it for Japanese OCR niches—solid from NDL lab despite 80 stars and 1.0% credibility score signaling early maturity. Docs cover GUI builds and CLI flags, but expect tweaks for production; test on your docs first.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.