nancui0000

Adaptive Weight Scheduling for Multi-Objective GRPO in Code Generation. Fixed multi-objective rewards cause reward hacking (short but broken code). Our curriculum approach—correctness first, then gradually adding efficiency/brevity—preserves 81.7% HumanEval while generating 11% shorter code.

49
6
69% credibility
Found Apr 23, 2026 at 49 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This repository implements a training framework for fine-tuning AI models to generate Python code that balances correctness, execution efficiency, and brevity using multi-objective reinforcement learning.

How It Works

1
🔍 Discover the tool

You find a helpful project that trains AI to write better computer programs that solve problems correctly, quickly, and shortly.

2
📚 Gather practice problems

You collect simple coding challenges with tests so the AI can practice and learn from real examples.

3
🎯 Pick your priorities

You choose to focus on getting answers right, running fast, being short, or a mix to guide the AI's learning.

4
🚀 Start the training

You launch the session and the AI practices generating code over and over, improving with each try based on your goals.

5
📈 Watch it improve

You check charts and logs to see the AI getting better at solving problems accurately and efficiently.

6
🧪 Test the results

You run the finished AI on fresh challenges to confirm it now produces superior code.

🎉 Enjoy smarter code writer

Your AI now reliably creates correct, speedy, and concise programs, ready for your needs.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 49 to 49 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is adaptive-mogrpo?

This Python repo lets you fine-tune code generation LLMs like Qwen2.5-Coder-7B using GRPO reinforcement learning with multi-objective rewards for correctness, runtime efficiency, and brevity. Fixed reward weights often lead to reward hacking—models spit out short but broken code—so it uses adaptive weight scheduling: start with correctness only, then gradually ramp in efficiency and brevity via a curriculum approach. You get CLI scripts to train LoRA adapters (`python train.py --preset adaptive_balanced --adaptive`), evaluate on HumanEval and MBPP (`python evaluate.py --model_dir outputs/...`), and plot Pareto frontiers (`python pareto.py`).

Why is it gaining traction?

It tackles a real pain in multiobjective RLHF for code: static adaptive weights cause models to game brevity at correctness's expense, but this adaptive weighted sum method preserves 81.7% HumanEval pass@1 while cutting code length 11%. Presets like "adaptive_eff_heavy" make experiments dead simple, and built-in eval + viz tools reveal tradeoffs instantly—no more manual benchmarking. Optimized for A100 GPUs, it trains in 2000 steps with wandb logging.

Who should use this?

RLHF researchers tuning code LLMs for benchmarks like HumanEval. Teams at AI startups building code agents that need correct, efficient output without verbose bloat. Fine-tuners frustrated by single-objective baselines like correctness-only GRPO.

Verdict

Grab it if you're prototyping adaptive weights for code gen RL—results hold up on standard evals. At 49 stars and 0.7% credibility, it's experimental with solid README but light tests; fork and validate locally before prod.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.