A research tool that automatically reconstructs the origins and relationships of datasets used in training large language models by analyzing their documentation from Hugging Face, papers, blogs, and GitHub.
How It Works
You find this helpful tool online that uncovers the hidden origins of data used to train AI models, like a family tree for datasets.
Jump into the ready-to-use web version to instantly trace a dataset and see its connections light up on screen.
Enter a dataset name and watch results appear right away.
Prepare a simple list of datasets and launch the analysis.
Jot down the names of datasets you want to explore, one per line in a text file.
Click to begin and feel the magic as it digs through descriptions, papers, and blogs to find connections.
Sit back while it smartly skips already-done ones and builds the full history step by step.
Celebrate with clear files showing the complete lineage graph and details, ready to explore or share.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.