GAIR-NLP / AcademiClaw
PublicAcademiClaw: When Students Set Challenges for AI Agents — a bilingual benchmark of 80 university student-sourced academic tasks.
AcademiClaw is a benchmark of 80 bilingual tasks from real undergraduate academic workflows designed to rigorously evaluate AI agents' long-horizon reasoning and tool-use capabilities.
How It Works
You find AcademiClaw, a collection of 80 real academic puzzles from students that stump top AI helpers.
Download the ready package with tasks, test setups, and example results from leading AIs.
Set up a safe space to run challenges, matching your computer's power needs.
Connect whichever smart assistant you want to challenge against the tasks.
Watch as your AI tackles dozens of tough problems like coding ray tracers or solving math proofs.
Get automatic grades on success, safety, and efficiency for each task and category.
See how your AI stacks up on the live leaderboard and share your breakthrough results.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.