Official repository for the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe"
A research project providing code and scripts to train large language models using on-policy distillation techniques on math datasets.
How It Works
You hear about a smart way to make AI models smarter at math by learning from a teacher model.
You set up a simple environment with the needed tools so everything runs smoothly on your computer.
You collect math problems and answers to use as learning material for your AI.
You ask a strong AI teacher to create helpful responses to your math problems.
You launch the special training where the student AI learns directly from the teacher's feedback.
You check how well your improved AI handles challenging math tests.
Your AI now solves harder problems better, ready for real use.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.