juanceresa

juanceresa / sift-kg

Public

Turn any collection of documents into a knowledge graph. Extract entities and relationships via LLM, deduplicate with your approval, and explore the result in your browser — all from the CLI.

227
23
100% credibility
Found Feb 13, 2026 at 43 stars 5x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

sift-kg is a command-line tool that turns folders of documents into interactive, browsable knowledge graphs by using AI to extract entities, relationships, and narratives.

How It Works

1
📚 Gather your documents

You collect a folder of reports, articles, PDFs, or records to uncover hidden people, companies, and connections.

2
🛠️ Get the tool ready

Download the simple program and connect it to an AI service so it can read and think about your files.

3
📁 Point to your folder

Choose your document folder and pick a ready-made focus like everyday analysis or detective work.

4
Watch it discover links

The tool scans every page, pulls out names, places, events, and relationships, building a web of insights.

5
🔍 Review smart suggestions

It flags possible same-name matches for you to approve or skip, keeping everything accurate.

6
🌐 Explore your map

Open a colorful, clickable graph in your web browser to search, zoom, and trace connections easily.

🎉 Share your discoveries

Export charts, lists, or stories to use in reports, investigations, or presentations with confidence.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 43 to 227 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is sift-kg?

sift-kg is a Python CLI that turns any pile of documents—PDFs, text, HTML—into a browsable knowledge graph, like turning any bike into an ebike or any image into pixel art, but for docs. Point it at a folder with `sift extract ./docs/`, and it pulls entities and relations using any LLM (OpenAI, Anthropic, Ollama local), builds a NetworkX graph, handles human-in-the-loop dedup via `sift review`, generates narratives with `sift narrate`, and spins up an interactive browser view or exports to GraphML/Gephi. Custom domains via YAML let you tweak entity/relation types; bundled OSINT mode adds shell companies and sanctions links.

Why is it gaining traction?

No databases, no servers—just pip install and run, with budget caps and local LLM support to keep costs down. The killer hook is interactive merge review in the terminal: LLM proposes dupes like "Bankman-Fried" variants, you approve/reject before anything merges, perfect for high-stakes accuracy. Exports feed Gephi/Cytoscape seamlessly, and the live FTX collapse demo graph hooks devs fast—CLI simplicity turns messy docs into diagrams quicker than manual ETL.

Who should use this?

OSINT analysts mapping leaks or FOIA dumps, investigative journalists tracing networks, legal teams reviewing filings, genealogists linking records, or academics structuring archives. If you're turning GitHub repos into diagrams or prompts via CLI scripts, this fits prototyping doc pipelines without infra.

Verdict

Grab it for OSINT or doc intel prototypes—human review and exports shine, examples rock. At 15 stars and 1.0% credibility, it's alpha: solid docs but light tests/stability; fork if needed. Worth the spin for CLI KG magic. (187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.