cognica-io

Bayesian probability transforms for BM25 retrieval scores

40
1
100% credibility
Found Feb 18, 2026 at 15 stars 3x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A library that transforms standard search ranking scores into reliable probability estimates of relevance, enabling better fusion of search signals and adaptation from user feedback.

How It Works

1
🔍 Discover Smarter Search

You hear about a helpful tool that turns regular search matches into clear confidence scores, making it easier to know which results really matter.

2
📦 Set It Up Quickly

With a simple download, you add this tool to your collection of helpers on your computer, ready to use right away.

3
📚 Add Your Documents

You share your list of texts or articles with the tool, so it knows what to search through.

4
Search with Confidence

You type a question, and instead of vague numbers, it gives back matches with easy-to-understand percentages of how relevant each one is.

5
Improve Over Time
👍
Quick Use

Jump straight to reliable searches without extra steps.

📈
Learn from Feedback

Show it examples of perfect matches to make future searches even sharper.

Perfect Search Ready

Now your searches feel trustworthy and smart, helping you find exactly what you need faster every time.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 15 to 40 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is bayesian-bm25?

Bayesian BM25 is a Python library that transforms raw BM25 search scores into calibrated [0,1] relevance probabilities using Bayesian probability updates. It solves the issue of BM25's unbounded, query-dependent scores, which make thresholding or fusing with other signals like embeddings unreliable—now you get consistent probs for hybrid search without needing relevance labels. Drop it in via pip install bayesian-bm25, index your corpus, and retrieve with probabilities instead of scores.

Why is it gaining traction?

It stands out by auto-calibrating via corpus base rates (68-77% ECE drop) and composite priors from term frequency and doc length, plus online SGD learning from user clicks. Developers hook on probabilistic fusion functions that fix naive AND/OR shrinkage for multi-signal ranking, and seamless bm25s integration for end-to-end retrieval. Benchmarks on BEIR datasets show it matches or beats raw BM25 in NDCG while adding meaningful probs.

Who should use this?

Search engineers tuning RAG pipelines for LLMs, where BM25 needs fusing with vectors. IR researchers exploring bayesian probability analysis in retrieval. Devs building semantic search who want online adaptation from feedback without retraining models.

Verdict

Worth prototyping for BM25-heavy apps needing calibrated scores—solid docs, quickstart examples, and Apache 2.0 license make it easy to try. At 15 stars and 1.0% credibility, it's early-stage with benchmarks but no broad adoption yet; pair with production monitoring.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.