evolvent-ai / ClawMark
Public🦞 ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
ClawMark is a benchmark that tests AI agents on realistic multi-day professional tasks using simulated tools like email, calendars, spreadsheets, and files across domains such as healthcare and sales.
How It Works
You stumble upon ClawMark on GitHub or a blog, a fun way to test AI helpers on real office jobs like helping doctors or managing HR.
You browse leaderboards and examples, seeing how different AIs handle multi-day tasks with emails, plans, and spreadsheets.
You connect an AI thinker, your planning pages, spreadsheets, and calendar so the tests feel real.
With one click, you launch an AI coworker on a job like reviewing patient meds over three days—it thinks, checks files, and emails.
You try single jobs or full suites across clinics, sales, or events to compare AI performance.
You open folders to see scores, chat histories, workspaces, and exactly what the AI did right or wrong.
You now know which AIs shine as reliable coworkers for tough, ongoing office work.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.