PyTorch-based open-source code for paper "SOD: Step-wise On-policy Distillation for Small Language Model Agents"
SOD is a research project providing code and methods to distill advanced reasoning abilities from large teacher models into smaller language model agents using step-wise on-policy distillation.
How It Works
You find a new way to make small AI helpers smarter at solving math, science, and coding puzzles by learning from bigger experts.
You prepare a simple space on your computer to build and train your own reasoning assistant.
You collect helpful examples of problems and solutions to teach your assistant.
You set up a protected spot where your assistant can safely try out code ideas without risks.
You start with basic lessons, then guide it step by step to think like the experts—watching it get sharper with each round.
You challenge your assistant with tough math contests, science questions, and code tasks to see its skills shine.
Your small helper now tackles complex problems confidently, beating others and ready for real-world use.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.