Can AI Agents Complete Everyday Online Tasks? 153 tasks, 144 live websites, 15 categories.
ClawBench is a benchmark that tests AI agents on 153 everyday web tasks across 144 live sites in 15 life categories, recording multi-layer sessions in isolated environments for evaluation.
How It Works
You find ClawBench, a fun way to test if AI helpers can handle everyday online chores like ordering food or applying for jobs.
You prepare by linking your favorite AI thinkers and a simple email service so tests can use real-looking info.
You choose from 153 real-life tasks, like booking a trip or writing a review, and select which AI to try.
With one click, your AI jumps into a safe, private browser to tackle the task just like you would.
You get videos, screenshots, and logs showing every click, form fill, and what happened step by step.
You learn if the AI succeeded, compare leaderboards, and share results to help improve everyone's AI buddies.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.