AgentBench is an evaluation tool for testing AI coding agents on custom tasks, measuring their performance through execution tests or output similarity, and generating easy-to-read reports.
How It Works
You hear about a handy tool that lets you fairly test different AI helpers on coding challenges.
You quickly add this testing kit to your setup so it's ready to use.
You create a collection of simple coding problems, including hints on what success looks like and ways to check answers.
You connect one of your AI coding assistants, so it can take on the challenges.
You hit go, and the tool runs your AI through every challenge, keeping track of time and effort.
A clear summary appears, showing pass rates, average scores, speeds, and breakdowns for each task.
You now clearly see which AI helper shines brightest and best fits your needs.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.