InternScience / ResearchClawBench
PublicResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
ResearchClawBench provides 40 expert-curated scientific research tasks with real datasets from published papers across 10 domains to benchmark AI agents' end-to-end research capabilities via autonomous analysis and peer-review-style LLM evaluation.
How It Works
You stumble upon ResearchClawBench, a fun way to see if AI helpers can tackle real science puzzles just like expert researchers.
Download it to your computer and launch the simple viewer with a few clicks—no tech skills needed.
Browse easy categories like earth science or neuroscience, and select one of 40 real-world problems with actual experiment data.
Choose your favorite AI agent and hit go—see it live explore data, crunch numbers, make charts, and write a full report.
Side-by-side, check the AI's findings against the human scientist's paper and key success checklist.
Get automatic grades on each part, a total score (50 matches the paper, higher beats it), and add to the global leaderboard.
You've tested cutting-edge AI on genuine science, discovered strengths and gaps, and contributed to the frontier of smart machines.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.