LemonTea03 / XTF
Public[ICLR 2026] Repository of "Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets"
This repository provides tools to filter out noisy parts in training data, improving how AI language models learn specific skills in math, coding, medicine, and finance.
How It Works
You come across this helpful tool while reading about smarter ways to train AI on real-world data like math problems or medical questions.
You set up a simple environment on your computer to get everything ready for using the tool.
Pick a dataset from math, coding, medicine, finance, or similar, and select a starting AI model to work with.
The tool scans your data, spots and removes the confusing or noisy bits, making it perfect for training.
Run the training process to teach your AI using the freshly cleaned data, watching it learn better.
Check how your new AI performs on test questions and see the clear improvements over the original.
Your AI now handles tasks like solving problems or answering questions much more accurately and reliably!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.