WillDreamer

WillDreamer / T2PO

Public

[ICML2026 Spotlight] T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

18
0
100% credibility
Found May 11, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

T²PO is an open-source framework for training AI agents in multi-turn reinforcement learning tasks across embodied, web, search, and game environments using uncertainty-guided exploration.

How It Works

1
🔍 Discover T²PO

You stumble upon T²PO while searching for ways to train smart AI helpers that can handle long conversations and tasks like room navigation or online shopping.

2
📱 Get set up

Run a few simple preparation steps to ready your computer for building virtual worlds where your AI can practice.

3
Choose your adventure
🏠
Room explorer

Train an agent to pick up objects and complete household chores.

🛒
Web shopper

Teach an agent to find and buy items on websites.

🔎
Info searcher

Build an agent that researches answers across documents.

🎮
Game player

Create a multi-step game solver with vision.

4
🚀 Start training

Launch the training with one click, and watch your AI learn smarter exploration over many practice turns.

5
📊 Check progress

Review charts and scores to see your agent getting better at tasks without getting stuck.

🎉 Smart agent ready

Your AI now handles complex, multi-step challenges reliably, thanks to clever uncertainty-guided decisions.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is T2PO?

T2PO is a Python framework for training stable multi-turn agentic reinforcement learning agents, tackling poor exploration that stalls LLM agents in long-horizon tasks. It applies uncertainty-guided control at token and turn levels to boost sample efficiency on benchmarks like ALFWorld, WebShop, and SearchQA. Users get scripts to prep environments (e.g., embodied, web, search) and train with PyTorch and vLLM, plus eval for commercial LLMs.

Why is it gaining traction?

This ICML 2026 Spotlight paper on GitHub delivers reliable multi-turn RL where baselines hesitate or loop—token resampling and turn interventions cut waste without custom hacks. Extensible design lets you plug in new envs or recipes fast, and it handles multi-modal games alongside text/web/search, making agentic exploration control practical for real experiments.

Who should use this?

RL researchers replicating ICML2026 agentic learning results on embodied tasks like ALFWorld or web agents in WebShop. Suited for engineers prototyping multi-turn LLM agents needing stable exploration in search or navigation, especially if you're tired of unstable rollouts in reinforcement setups.

Verdict

Grab it if multi-turn agentic RL is your focus—the ICML 2026 spotlight validates the approach—but with 18 stars and 1.0% credibility, it's raw: expect setup tweaks and thin docs. Solid starting point for forking into production.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.