This is a Stanford research project that compares different ways to train AI assistants to complete complex real-world tasks like online shopping and household chores. The project runs experiments on two environments (WebShop and ALFWorld) using six different training methods, then measures which approach helps the AI learn best. Users set up cloud computing infrastructure, optionally connect an AI evaluation service, prepare shared materials with their team, launch experiments comparing the methods, and analyze results through automated charts and tables. The research found that one method (TurnRDV2) achieves significantly better results than the others, improving task success rates by 12-18 percentage points.
How It Works
You find a Stanford research project that teaches AI assistants to tackle complex multi-step tasks like shopping and household chores.
You install some basic tools and connect your computer to a powerful cloud computer that can run the experiments for you.
For one of the comparison methods, you optionally connect an AI service that can evaluate how well the assistant handles each step of a task.
Your team shares pre-trained assistants and datasets on the cloud β you check what's already there and grab anything missing for your experiments.
Use an AI judge or learned decomposer to give detailed feedback on each step
Use simpler approaches that automatically measure progress without external help
Test the original pre-trained assistant with no learning at all
The cloud computer trains the assistant through practice β it tries tasks, learns from what worked and what didn't, and improves round by round.
You pull the training logs and run scripts that automatically create charts comparing how well each method performed.
You find that one method dramatically outperforms the others β boosting success rates by 12-18 percentage points over the baseline!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.