microsoft

The official codebase for "Experiential Reinforcement Learning" - https://arxiv.org/pdf/2602.13949v1

49
3
100% credibility
Found Mar 12, 2026 at 44 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A framework implementing Experiential Reinforcement Learning to train language agents through experience-reflection-consolidation loops on tasks like puzzles and search.

How It Works

1
๐Ÿ” Discover ERL

You hear about a smart way to teach AI helpers to solve puzzles by learning from their own tries and reflections.

2
๐Ÿ“ฅ Get the trainer

Download the free puzzle trainer tools that make AI learning easy and automatic.

3
๐ŸŽฎ Pick a puzzle

Choose a fun challenge like moving boxes in Sokoban or navigating a frozen lake.

4
๐Ÿง  Watch it learn

See your AI try a solution, think about mistakes, improve, and get better each time.

5
โš™๏ธ Adjust settings

Tweak how much it practices or which puzzles to focus on for best results.

โœ… AI solves puzzles

Celebrate as your trained helper masters tough tasks perfectly on its own!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 44 to 49 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is experiential_rl?

This official GitHub repository is the Python codebase for Experiential Reinforcement Learning (ERL), detailed in the paper at https://arxiv.org/pdf/2602.13949v1. It lets you train language agents through an experience-reflection-consolidation loop: agents attempt tasks, reflect on feedback from failures, refine their approach, and distill improvements directly into the model. Users get adaptive agents for complex tasks like puzzles or QA, built on rLLM for rollouts and verl for training, with Docker support and examples for gridworlds or search workflows.

Why is it gaining traction?

Unlike pure reward-maximizing RL, ERL's explicit reflection turns environmental feedback into lasting policy changes via policy gradients and supervised distillation, yielding more robust agents without endless optimization. Developers notice fewer failure loops in training, plus easy extension for custom agents via SDK engines or workflows. The official GitHub releases page and actions streamline setup, hooking RL practitioners tired of brittle language model fine-tuning.

Who should use this?

AI researchers fine-tuning LLMs for planning or tool-use tasks, like reinforcement learning experts tackling Sokoban-style puzzles or retrieval-augmented QA. It's for teams building experiential learning pipelines in Python, especially those integrating with OpenAI/Fireworks engines or environments like FrozenLake and HotpotQA. Skip if you're not doing agent RLโ€”it's overkill for basic supervised training.

Verdict

Promising for experiential reinforcement learning experiments, with solid docs, Docker, and examples, but the 1.0% credibility score reflects its early stage (44 stars, Microsoft research origins). Try the Sokoban demo first; production use needs more community validation and tests.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.