OliverLeeXZ / SERL
PublicOfficial implement on 'What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents'
SERL is a reinforcement learning toolkit for training AI assistants (LLMs) to complete complex, multi-step tasks. It works with two simulation environments: ALFWorld (household tasks like picking up objects and placing them in correct locations) and WebShop (online shopping tasks like finding products that match specific criteria). The key innovation is "selective hindsight distillation" - the system learns from past task attempts but only updates the AI's action decisions, preserving its reasoning capabilities. Built on top of the veRL framework, it provides recipes and scripts for training AI agents using multi-source feedback including immediate results, future trajectories, and successful example paths.
How It Works
A researcher or developer learns about SERL, a method for training AI assistants to handle complex multi-step tasks like organizing a home or shopping online.
You select one of two training environments: ALFWorld for household tasks like picking up and placing objects, or WebShop for online shopping challenges.
Your AI assistant attempts tasks, making decisions step by step. When it succeeds or fails, the system carefully learns from those experiences.
The system learns from past experiences but preserves the AI's reasoning process - it only updates what actions to take, not how to think.
You track how well your AI improves at completing tasks, watching success rates climb as training progresses.
Your trained AI assistant can now successfully handle complex, multi-step tasks in homes and online stores - tasks that would have been too difficult before.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.