sciguy-code

fast full-text search engine in c++, inverted index, bm25 ranking, sub-millisecond query latency

11
0
100% credibility
Found May 12, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
C++
AI Summary

A lightweight, from-scratch search engine for indexing and querying a personal collection of text documents with support for advanced queries and relevance ranking.

How It Works

1
📚 Discover the search tool

You hear about a handy tool that lets you quickly search through your own collection of articles and notes, like a personal Google for your files.

2
📝 Gather your texts

Collect sample writings on topics like science, history, or music, or use the built-in generator to create a ready-to-go set of 50,000 short articles.

3
🔍 Prepare your collection

Feed your texts into the tool to create a smart search foundation that understands words, ignores common ones, and prepares everything for fast lookups.

4
🚀 Launch the searcher

Start the interactive mode, and your collection is ready to explore right from your command line.

5
💬 Ask natural questions

Type queries like 'black hole AND gravity' or 'shakespeare poetry', using AND, NOT, or quotes for exact phrases, and watch it find matches instantly.

🎉 Find treasures fast

Get ranked lists of the best matching articles with titles and relevance scores, making it easy to dive into exactly what you need.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is mini-search-engine?

This C++20 project delivers a lightweight, standalone full-text search engine for text corpora, handling indexing and BM25-ranked queries with sub-millisecond latency on modest hardware. Feed it JSONL files or directories of text docs via CLI to build a persistent binary index, then query interactively or one-shot with support for AND/OR/NOT and phrase searches like "black hole AND gravity". It's pure standard library plus POSIX—no external dependencies—for fast full-text search in embedded or local apps.

Why is it gaining traction?

It crushes latency benchmarks (p50 13µs, 57k peak QPS on M2) while staying dead simple: generate a 50k-doc corpus across domains like physics and CS with one Python script, index in seconds, query instantly. Developers dig the no-frills CLI, tunable BM25 params, and thread-pool scaling without bloat, making it a mini search engine that outperforms toy alternatives for real workloads.

Who should use this?

C++ backend devs building local tools like log analyzers or doc search in CLI apps. Prototype IR systems or add fast GitHub search to scripts without spinning up servers. Teams needing offline, incremental indexing for full-text info retrieval on mini PCs or edge devices.

Verdict

Grab it for learning or light production—solid benchmarks and tests make it credible despite 11 stars and 1.0% score; early maturity means watch for compression/updates. Strong start for fast GitHub projects craving sub-ms search.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.