thunlp / OPD

Public

Official repository for the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe"

100% credibility

Found Apr 16, 2026 at 50 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

A research project providing code and scripts to train large language models using on-policy distillation techniques on math datasets.

How It Works

🔍 Discover better AI training

You hear about a smart way to make AI models smarter at math by learning from a teacher model.

🛠️ Prepare your workspace

You set up a simple environment with the needed tools so everything runs smoothly on your computer.

📚 Gather math examples

You collect math problems and answers to use as learning material for your AI.

🎓 Generate teacher answers

You ask a strong AI teacher to create helpful responses to your math problems.

🚀 Start smart training

You launch the special training where the student AI learns directly from the teacher's feedback.

📊 Test on tough problems

You check how well your improved AI handles challenging math tests.

🏆 AI masters math!

Your AI now solves harder problems better, ready for real use.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 50 to 46 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is OPD?

OPD is the official repository for rethinking on-policy distillation of large language models, delivering scripts to train students on teacher-provided token rewards while fixing common failures like incompatible thinking patterns. Developers run bash commands for distillation, SFT rollouts, or GRPO RL on math datasets, with Docker support across CUDA, NPU, and ROCm for easy official repository Linux/Ubuntu setups. Python-based, it yields stronger small models via validated recipes like cold starts and prompt alignment.

Why is it gaining traction?

Unlike generic RL tools, OPD focuses on OPD-specific pitfalls with token-top-K strategies and weighting modes users tweak via env vars, enabling quick recovery of failing runs. Reproducible SLURM/bash workflows and verl/LlamaFactory integration cut experimentation time, while evaluation pipelines reuse JustRL grading. Docker official repository images handle hardware quirks, drawing devs chasing distillation edges without custom hacks.

Who should use this?

LLM researchers distilling reasoning from 7B+ teachers to 1-4B students on benchmarks like DAPO-Math or AIME. Suited for academic teams probing weak-to-strong gaps or industry folks optimizing math solvers via on-policy signals, skipping heavy RLHF.

Verdict

Grab it for cutting-edge OPD if you're in distillation research—1.0% credibility and 46 stars signal early maturity with paper-led docs, but bash simplicity and Docker make it low-risk to prototype. Polish tests before production.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

3,332

Followers

Base stars: 46 stars

Bonus: AI verified quality (100%)

Account age: 3,657 days

Repo age: 4 days

Updated: Apr 16, 2026