HumphreySun98 / repoagentbench
PublicSWE-bench for your codebase β mine your merged PRs into local, contamination-free coding-agent benchmarks. Adapters: claude-code, aider (Opus 4.7 / GPT-5.5 / Sonnet 4.6 / Gemini 3.1 Pro).
RepoAgentBench turns merged pull requests from a codebase into reproducible benchmarks to evaluate AI coding agents against the project's own tests and constraints.
How It Works
You hear about RepoAgentBench, a tool that lets you check which AI coding buddies really fix bugs in your own projects using real past changes.
You set it up easily on your computer so everything is prepared to start testing.
You choose a past successful change from your project's updates and turn it into a challenge with broken code and checks.
The tool prepares the exact broken starting point and success tests automatically, ready for AI to tackle.
A simple stand-in applies the known right answer to confirm everything works smoothly.
Smart AI helpers jump in to fix the bug on their own using your challenge.
You get a clear chart showing which AI passed the tests, with details on what they changed.
Now you clearly see which AI handles your code's real bugs best, ready to use confidently.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.