bcherb2

bcherb2 / pdfiles

Public

in case you need to search visually through a very large PDF set

49
3
100% credibility
Found Feb 19, 2026 at 43 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

PDfiles enables visual searching of large PDF collections by converting page images into searchable vectors for text descriptions and image similarity matching.

How It Works

1
๐Ÿ“š Gather your PDFs

Put all your PDF files from work, books, or scans into one folder on your computer.

2
๐Ÿ’พ Get PDfiles

Download the free PDfiles tool designed to help you search through stacks of PDFs by describing what you want to find.

3
๐Ÿ“ Point to your folder

Simply tell PDfiles where your PDF folder is so it knows what documents to explore.

4
๐Ÿš€ Start it up

Hit the start button โ€“ PDfiles reads your PDFs, prepares a smart index, and gets ready in a few minutes.

5
๐ŸŒ Open your search page

Visit the web address that appears, and your personal PDF search engine is live.

6
๐Ÿ” Describe and discover

Type everyday words like 'handwritten notes' or 'blueprint diagram', and see matching pages pop up instantly.

โœ… Master your documents

Effortlessly find similar photos, browse organized groups of pages, and unlock everything hidden in your PDFs.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 43 to 49 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is pdfiles?

PDfiles indexes massive PDF collections for visual search by text queries like "handwritten notes" or "surveillance photos," embedding page images directly without OCR. Point it at your docs folder via Docker, and it delivers a web UI for semantic search, similar-image matching, and auto-clustered "shelves" of pages. Python backend with FastAPI serves a React frontend, using Qdrant for vectors and NVIDIA GPUs for speedโ€”in case you forgot how buried gems work in huge sets.

Why is it gaining traction?

Zero-wait indexing via streaming pipeline makes pages searchable instantly, unlike batch tools that choke on millions. CLI commands like `./pdfiles.sh up down backup restore` handle Docker orchestration effortlessly, plus admin exports for sharing indexes. Stands out for GPU-accelerated ColPali embeddings and shelf browsing, hooking devs tired of regex case-sensitive searches or manual GitHub case opening.

Who should use this?

Legal tech teams sifting GitHub case files, researchers with scanned archives needing "case you didn't know" visuals, or data engineers building kebab-case pipelines for switch case in GitHub Actions. Suits anyone with 12GB+ NVIDIA VRAM and Docker; CPU querying works post-index.

Verdict

Solid pick for visual PDF search if you have the GPUโ€”Docker deploy shines, features like shelves deliver real value. 43 stars and 1.0% credibility signal early maturity; test thoroughly, but production-ready for targeted use cases.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.