benzsevern / goldenmatch
PublicEntity resolution toolkit -- deduplicate records, match across sources, create golden records. 97% F1 on structured data, LLM scoring for products. Polars-native, 7800 rec/s, zero-config CLI.
GoldenMatch is an open-source Python toolkit for deduplicating records, matching entities across datasets, and generating golden records using fuzzy matching, blocking strategies, and optional AI enhancements.
How It Works
You hear about a simple tool that cleans up messy lists by finding and merging duplicates automatically.
Install the tool with one easy command and run the friendly setup helper to connect any helpers like AI thinkers if you want.
Choose your messy customer or product list, and it smartly guesses how to clean it up.
The golden screen lights up showing groups of matching records with easy sliders to tweak and review.
Quickly check borderline matches, adjust confidence levels, and confirm what to keep.
Export perfect golden records with no duplicates, ready to use, saving you hours of manual work.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.