AbhijeetP21

A multi-agent data wrangler project for data profiling, transformation, and quality scoring

12
0
100% credibility
Found Feb 20, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

An open-source Python framework that automates data cleaning through profiling, transformation generation, validation, quality scoring, and ranking, accessible via command line or a Streamlit web app.

How It Works

1
🕵️ Discover the tool

You find a helpful data cleaning assistant online that promises to fix messy spreadsheets automatically.

2
🌐 Open the web page

Visit the simple web interface in your browser to get started without any hassle.

3
📁 Upload your file

Drag and drop your CSV file to let it peek inside and understand your data.

4
Run the cleanup

Hit the button and relax as it smartly profiles issues, suggests fixes, checks safety, scores improvements, and picks the best ones.

5
📊 Review improvements

See a clear summary of what was cleaned, quality boosts, and top recommended changes with easy previews.

🎉 Download perfect data

Grab your polished, high-quality dataset ready for analysis or sharing, saving you hours of manual work.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 12 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is multi-agent-data-wrangler?

This Python project automates data wrangling with a multi-agent system that profiles messy datasets, generates transformation candidates like filling missings or normalizing columns, validates them for safety, scores quality improvements across completeness and uniqueness, and ranks the best fixes. Upload a CSV via its Streamlit web app to preview issues, tweak pipeline settings like iteration count or quality thresholds, and download cleaned data plus JSON reports—or run it via CLI commands like `data-wrangler run --config pipeline.yaml` for batch jobs. It's an evaluation-driven multi agent data analysis system tackling the tedium of manual ETL for real-world data.

Why is it gaining traction?

Unlike basic pandas scripts or one-off cleaners, it orchestrates agents for profiling-to-ranking in configurable pipelines with failure recovery (skip/retry/abort), handling datasets up to 200k rows efficiently via sampling. The Streamlit UI makes it dead simple to visualize progress and metrics, while CLI and programmatic APIs fit into ML workflows—think multi agent data wrangling on GitHub as a smarter alternative to langgraph multi agent setups for data tasks. Developers dig the pluggable ranking policies and reversible transforms that let you iterate without data loss.

Who should use this?

Data engineers prepping raw CSVs for ML models before training. Analysts automating quality checks on customer datasets with high missings or outliers. Teams building multi agent data pipelines who want a drop-in orchestrator over writing custom sklearn preprocessors.

Verdict

Worth a test drive for small-to-medium data cleaning if you need agent-based automation—solid docs, CLI/UI, and tests make it playable despite 11 stars and 1.0% credibility signaling early beta. Fork and contribute if it fits; don't bet production on it yet.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.