GithubX-F

GithubX-F / ProxMO-RL

Public

Proximity-based Multi-turn Optimization (ProxMO) - Official Implementation

36
2
100% credibility
Found Mar 06, 2026 at 36 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

ProxMO is a lightweight framework that improves training of language model agents for multi-turn tasks by smarter reward sharing across steps.

How It Works

1
🔍 Discover ProxMO

You stumble upon a clever tool that trains AI helpers to master tricky, step-by-step chores like tidying rooms or shopping online.

2
📦 Prepare your playground

Set up fun virtual worlds where your AI can practice real-life tasks safely and endlessly.

3
🎮 Pick a challenge

Choose everyday adventures like cleaning, heating food, or finding items to train your AI on.

4
🚀 Launch the learning magic

Hit start and watch your AI get smarter with each try, nailing tough sequences others struggle with.

5
📊 Check the progress

Review scores showing huge wins over big-name AIs like GPT-4 on every key measure.

🎉 Super-smart agents ready

Celebrate as your AI aces multi-step tasks, outperforming rivals and ready for action!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 36 to 36 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ProxMO-RL?

ProxMO-RL is the official Python implementation of Proximity-based Multi-turn Optimization (ProxMO), a lightweight framework for training LLM agents in multi-turn reinforcement learning. It tackles context-dependent credit assignment in long-horizon tasks by modulating gradients based on episode success rates and aggregating step-level baselines via semantic proximity, plugging directly into GRPO pipelines. Users get boosted success rates on benchmarks like ALFWorld and WebShop through simple bash scripts for training.

Why is it gaining traction?

It delivers tangible gains—up to +28.9% on ALFWorld with 1.5B models and beats GPT-4o on key metrics—while adding just 1.09% overhead via vectorized operations, no extra networks required. Hyperparameter robustness across scales (1.5B to 7B) and plug-and-play setup make it a low-risk upgrade over group-based methods like GiGPO. Developers hook on the full ablation studies and production-ready efficiency for real multi-turn agent workflows.

Who should use this?

RL engineers tuning LLMs for embodied AI tasks in ALFWorld or WebShop, where sparse rewards kill standard RL. Researchers iterating on multi-turn optimization for text-based or hybrid environments needing better credit assignment without retraining from scratch.

Verdict

Early but promising official ProxMO implementation (36 stars, 1.0% credibility)—solid README, env setup scripts, and results make it worth a spin for LLM-RL experiments despite low maturity. Fork and benchmark against GRPO if proximity-based tweaks fit your multi-turn pipeline.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.