evolvent-ai / Terrarium
PublicTerrarium: Multi-turn data engine for evaluating and optimizing LLM agents in living environments.
Terrarium lets you create dynamic test worlds for AI agents, where they handle multi-step tasks across changing environments like email and databases, then scores their performance.
How It Works
You hear about Terrarium, a fun way to test AI assistants in realistic everyday scenarios like handling emails or updating calendars.
You grab the tools with a simple download and prepare connections to services like email or calendars so everything works smoothly.
You write a simple story in plain steps, like 'check email and add a meeting', mixing real-world actions that change over time.
You pick a smart AI agent ready to go, like one that thinks step-by-step, and connect it to your scenario.
With one command, you start the test and watch your AI navigate the changing world, looping and branching as needed.
You see clear scores, detailed logs of what happened, and highlights of successes or where it went off track.
Your AI gets better benchmarks, you collect real training data, and you're set to build even smarter assistants.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.