nancui0000 / adaptive-mogrpo
PublicAdaptive Weight Scheduling for Multi-Objective GRPO in Code Generation. Fixed multi-objective rewards cause reward hacking (short but broken code). Our curriculum approach—correctness first, then gradually adding efficiency/brevity—preserves 81.7% HumanEval while generating 11% shorter code.
This repository implements a training framework for fine-tuning AI models to generate Python code that balances correctness, execution efficiency, and brevity using multi-objective reinforcement learning.
How It Works
You find a helpful project that trains AI to write better computer programs that solve problems correctly, quickly, and shortly.
You collect simple coding challenges with tests so the AI can practice and learn from real examples.
You choose to focus on getting answers right, running fast, being short, or a mix to guide the AI's learning.
You launch the session and the AI practices generating code over and over, improving with each try based on your goals.
You check charts and logs to see the AI getting better at solving problems accurately and efficiently.
You run the finished AI on fresh challenges to confirm it now produces superior code.
Your AI now reliably creates correct, speedy, and concise programs, ready for your needs.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.