GiovanniPasq

GiovanniPasq / chunky

Public

Validate, visualize, edit, and export chunks for RAG pipelines.

17
1
100% credibility
Found Mar 10, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Chunky is an open-source application for converting PDFs to Markdown using multiple methods, splitting the text into chunks with various strategies, visualizing and editing them alongside the source PDF, and exporting for AI retrieval pipelines.

How It Works

1
Find Chunky

You discover this free tool that helps make sure your documents are perfectly split into pieces for AI chat systems.

2
💻 Start on your computer

Download and launch the app, which opens a simple web page in your browser ready to use.

3
📤 Upload your document

Drag in a PDF or text file from your computer into the side list to begin working.

4
Convert to editable text

If it's a PDF, choose a reading method to turn pages into clean text you can see side-by-side with the original.

5
🎨 Break into colored chunks

Pick a splitting style and watch the text divide into numbered, colorful sections matching the PDF view.

6
✏️ Edit bad splits

Click any chunk that looks wrong and fix the text right there without starting over.

💾 Export perfect pieces

Save the cleaned-up chunks as a simple file, ready to power your AI questions and answers.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 14 to 17 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is chunky?

Chunky is a local web app built with Python FastAPI backend and React TypeScript frontend that lets you upload PDFs or Markdown, convert them to Markdown using four engines (fast PyMuPDF, layout-aware Docling, reliable MarkItDown, or VLM like local Ollama), then split into chunks with token, recursive, character, or Markdown strategies. You get side-by-side PDF/Markdown views with synced scrolling, color-coded chunk visualization, direct in-browser editing, and JSON export ready for RAG indexing. It solves the "blind chunking" problem where poor splits degrade retrieval without you noticing.

Why is it gaining traction?

Unlike github action validate tools or basic splitters, Chunky gives visual feedback on every chunk boundary before committing to your vector store, with switchable converters and on-the-fly re-chunking. Devs dig the pluggable architecture for custom converters, VLM support without API keys (via Ollama), and Docker one-click setup. It's the chunky flavor dev github io tool for iterating fast on RAG quality, standing out from chunky mod github or minecraft chunky github alternatives by focusing on pipeline validation.

Who should use this?

AI engineers building RAG apps who waste time debugging retrieval fails from bad chunks. Docs teams processing PDFs for LLM ingestion, needing to validate github workflow files or complex layouts. Indie devs prototyping agentic RAG, akin to chunky monkey github experiments, tired of trial-and-error indexing.

Verdict

Try it for RAG prototyping—solid UX and extensibility punch above 14 stars and 1.0% credibility score. Early alpha means rough edges (no tests, evolving APIs), but great README and Docker make it low-risk for local validation workflows.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.