WillDreamer / T2PO
Public[ICML2026 Spotlight] T2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning
T²PO is an open-source framework for training AI agents in multi-turn reinforcement learning tasks across embodied, web, search, and game environments using uncertainty-guided exploration.
How It Works
You stumble upon T²PO while searching for ways to train smart AI helpers that can handle long conversations and tasks like room navigation or online shopping.
Run a few simple preparation steps to ready your computer for building virtual worlds where your AI can practice.
Train an agent to pick up objects and complete household chores.
Teach an agent to find and buy items on websites.
Build an agent that researches answers across documents.
Create a multi-step game solver with vision.
Launch the training with one click, and watch your AI learn smarter exploration over many practice turns.
Review charts and scores to see your agent getting better at tasks without getting stuck.
Your AI now handles complex, multi-step challenges reliably, thanks to clever uncertainty-guided decisions.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.