dreamyoungs

dreamyoungs / trex

Public

🦖 Lightweight Rust engine for extracting tables from PDFs — zero external dependencies, single binary.

17
1
100% credibility
Found Mar 01, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

TREX extracts tables from PDF documents into structured JSON or CSV output using lightweight detection methods, accessible via command line, programming languages, or a web service.

How It Works

1
📄 You have PDFs with tables

You're looking at reports or invoices packed with data tables you need to use elsewhere.

2
🔍 Discover TREX

You find TREX, a handy tool that pulls tables out of PDFs cleanly and quickly.

3
⬇️ Get TREX ready

Grab the tool and set it up on your computer in moments—no fuss.

4
Pick your way to use it
Quick command

Point it at your PDF and instantly get tables as lists.

💻
In your program

Add it to your scripts so apps handle PDFs automatically.

🌐
Web helper

Launch a service to upload and process files from anywhere.

5
Magic extraction

Feed in your PDF, pick smart detection, and watch perfect rows and headers appear.

Tables ready to go

Your data is now neat lists you can copy to spreadsheets, analyze, or build with—saving hours!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 17 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is trex?

Trex is a lightweight Rust binary that extracts tables from PDFs into JSON or CSV, handling both gridline-based (lattice) and borderless (stream) layouts with an optional DL router for smarter mode selection. It compiles to a single executable with zero external deps, running at ~30MB RAM—perfect for serverless. Use it via CLI (`trex extract invoice.pdf`), Node.js npm packages (CLI wrapper or native NAPI bindings), Python bindings, or a Dockerized REST API on port 8080.

Why is it gaining traction?

Unlike Python stalwarts like Camelot or Tabula that balloon to 500MB+ with OpenCV/Ghostscript, trex stays lean for Lambda or Cloud Run without OOM kills. Multi-runtime support (Rust core powers Node/Python/Docker) and a feedback loop—log events, retrain the router model via Python scripts—let accuracy improve over time. Devs dig the no-subprocess native Node option and event telemetry for prod monitoring.

Who should use this?

Backend engineers parsing invoices, reports, or financial PDFs in Node/Python services. Serverless teams needing a lightweight Rust HTTP server for API endpoints. Data pipelines where Python github lightweight charts alternatives fail on memory.

Verdict

Grab it if you need a credible lightweight github trex for PDF tables—docs are solid, multilingual, with benches and ML pipeline ready. At 12 stars and 1.0% credibility, it's early; benchmark against your docs before prod, but the single-binary hook makes it worth a spin.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.