Genentech / compbiobench-runner
PublicCompBioBench: A benchmark of 100 diverse, verifiable questions for agents for computational biology (scripts for running agents)
A benchmarking tool for testing AI coding assistants on computational biology problems using provided data files, generating detailed traces, performance metrics, and cost reports.
How It Works
You hear about this tool from Genentech that lets you check how well AI helpers solve real-world biology puzzles with data files.
You make a simple list of biology challenges, like finding gene positions or analyzing expressions, and note where your data files are.
You set up the shared biology toolkit and connect your chosen AI helper, like Claude or Codex, so everything works smoothly.
You start running all your questions at once, watching as the AI tackles them one by one in safe separate spaces, tracking time and costs automatically.
You open easy-to-read reports showing exactly what the AI thought, tools it used, answers it gave, and any mistakes.
You pull together scores, costs, and performances from multiple AI helpers into one clear summary table.
You now know which AI excels at your biology tasks, with full details to guide your research choices.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.