GithubX-F

GithubX-F / DynaMO-RL

Public

Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization (DynaMO) - Official Implementation

43
2
100% credibility
Found Mar 06, 2026 at 43 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

DynaMO-RL is a research toolkit that enhances AI language models' math reasoning through smart training techniques on benchmarks like AIME and MATH.

How It Works

1
πŸ” Discover DynaMO

You learn about a clever tool that helps AI get really good at solving tough math problems.

2
πŸ“₯ Get the kit

Download the free package and set it up on your computer – it's quick and easy.

3
πŸ“š Pick math challenges

Choose some math problems for your AI to practice and improve on.

4
πŸš€ Start the magic

Hit go and watch your AI train smarter, trying different ways to solve puzzles faster.

5
πŸ“ˆ Watch it grow

Check the scores as your AI masters harder problems step by step.

πŸ† AI math whiz!

Celebrate – your AI now solves advanced math like an expert!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 43 to 43 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is DynaMO-RL?

DynaMO-RL delivers dynamic rollout allocation and advantage modulation to boost policy optimization in reinforcement learning for LLM reasoning tasks. It tackles inefficient uniform rollout distribution and gradient instability by prioritizing high-variance prompts at the sequence level and adjusting advantages via entropy signals at the token level, yielding better math benchmark scores like AIME and MATH500. Built in Python on the verl RLHF framework, users get plug-and-play scripts for training models from 1.5B to 32B parameters with Ray-distributed support.

Why is it gaining traction?

It consistently beats baselines like GRPO and KL-COV by 2-6% average on six reasoning benchmarks across model scales, thanks to variance-minimizing allocation and gradient-aware tweaks that prevent training collapse. Developers appreciate the quick-start bash scripts for Qwen2.5 models and seamless verl integration, enabling dynamic yield rollout without custom hacks. Theoretical proofs back the gains, making it a smart pick over static RL setups.

Who should use this?

RL engineers fine-tuning LLMs for math or coding reasoning, especially on verifiable reward tasks. Teams with Ray clusters scaling 7B-32B models via FSDP or Megatron will value the rollout dynamic programming for efficient advantage allocation. It's ideal for researchers replicating dynamo rl experiments on AIME-style data.

Verdict

Try it if you're iterating on RLVRβ€”benchmarks show real uplift, and verl recipes make setup straightforward. With 43 stars and 1.0% credibility, it's early-stage but paper-backed; expect solid docs but watch for edge cases in massive deploys.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.