Berdyanskov / CargoDash_preview
PublicA Python library for building simple, modular, multifunctional, and efficient large model training data synthesis/augmentation pipelines.
CargoDash is a tool that helps you build pipelines for preparing training data for AI models. It comes with a visual web editor where you can drag and connect building blocks to design how your data should flow—reading raw text, cleaning it up, asking AI models to rewrite or evaluate it, and saving the results. You can also write pipelines directly in Python if you prefer. The tool supports running AI models locally on your own machine or connecting to cloud AI services, and it handles the complex work of processing data in parallel while keeping everything organized.
How It Works
You're building an AI assistant and realize your training data needs cleaning, filtering, or augmentation before it can teach your model anything useful.
Instead of wrestling with code, you drag building blocks onto a canvas—each one represents a step in your data pipeline.
You wire together nodes: raw text goes in, gets cleaned, then branches—one path for high-quality data, another for samples that need improvement.
Connect an LLM node that rewrites each sentence based on a template you provide.
Set up a voting committee where multiple models decide which samples are good enough to keep.
One click turns your visual diagram into a runnable script. You can save it, share it, or tweak it further.
Your data flows through each step automatically. The system handles parallel processing and keeps everything organized.
Clean, high-quality, AI-augmented training data flows out the other end—exactly what your model needs to learn.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.