Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
An academic evaluation toolkit for testing AI model improvements in math reasoning and code generation using established benchmarks.
How It Works
You find this helpful tool on GitHub while looking for ways to fairly test AI models on math problems and writing code.
The clear instructions show how researchers improved AI helpers by testing them properly on math and coding challenges.
You easily check how well your AI solves math puzzles using the ready-made tests included.
Run quick checks on code your AI generates, seeing exactly how it performs on real problems.
You now have solid scores showing your AI's true strengths in math and coding, ready to improve it further.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.