HaroldConley / chunk-norris

Public

Evaluate and compare chunking strategies for RAG pipelines

100% credibility

Found Apr 15, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

chunk-norris is a Python tool that evaluates multiple document chunking strategies for RAG systems by scoring retrieval performance against user-provided questions and answers, then returns the best chunker for immediate use.

How It Works

📖 Discover the tool

You learn about chunk-norris, a helpful friend that tests different ways to break your long document into bite-sized pieces so AI can find answers faster and better.

📝 Gather your document and questions

You collect your full document text and write down 15-30 real questions people might ask, along with their exact answers from the text.

🧪 Set up the tester

You create a simple tester using the built-in thinking tool that understands meaning without needing extra services.

⚗️ Run the comparison

You hand over your document, questions, and a few splitting styles like by size, sentences, or paragraphs, and it automatically tests each one to see which grabs the right info best.

📊 Review the scores

A clear table appears showing scores for completeness and focus, highlighting the winning splitter that works best for your document.

🏆 Grab the winner

You instantly get the top splitter ready to use, feeling confident you've picked the perfect method without guessing.

🎉 Perfect AI answers

Your document is now split ideally, so your AI question-answering setup retrieves spot-on info every time, making searches smooth and reliable.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 19 to 24 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is chunk-norris?

Chunk-norris is a Python tool that empirically tests chunking strategies for your RAG pipelines, turning guesswork into data-driven decisions. Feed it a document, questions with expected answers, and multiple chunkers like fixed-size, recursive, or sentence-based; it retrieves top chunks via semantic search, scores them on token recall and BERT similarity, then ranks them in a comparison table. You get the winning chunker instance ready to plug into your pipeline—no LLM needed for evaluation.

Why is it gaining traction?

Unlike tutorials that default to arbitrary 512-token chunks, it measures retrieval quality against your actual docs and queries, spotting optimal sizes per document type. Standout hooks include deterministic scoring (reproducible, free), Excel/JSON exports for detailed breakdowns, and pluggable embedders starting with a local BERT model. Developers love handing off the best chunker directly, skipping boilerplate.

Who should use this?

RAG engineers tuning pipelines for legal contracts, technical manuals, or customer FAQs where chunking impacts recall most. ML ops folks evaluating GitHub repos for production RAG, or indie devs iterating on doc-specific strategies before vector store indexing.

Verdict

Grab it for quick RAG experiments—install via pip from GitHub, run evals in minutes, solid docs despite 19 stars and 1.0% credibility score signaling early alpha stage. Pair with your questions.json for immediate wins, but watch for v2 multi-doc support.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 24 stars

Bonus: AI verified quality (100%)

Account age: 1,427 days

Repo age: 8 days

License: MIT

Updated: Apr 19, 2026