Genentech / compbiobench-runner

Public

CompBioBench: A benchmark of 100 diverse, verifiable questions for agents for computational biology (scripts for running agents)

www.biorxiv.orgcontent10.648982026.04.06.716850v1

100% credibility

Found Apr 12, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

A benchmarking tool for testing AI coding assistants on computational biology problems using provided data files, generating detailed traces, performance metrics, and cost reports.

How It Works

🔍 Discover the biology AI tester

You hear about this tool from Genentech that lets you check how well AI helpers solve real-world biology puzzles with data files.

📋 Gather your questions and data

You make a simple list of biology challenges, like finding gene positions or analyzing expressions, and note where your data files are.

⚙️ Get ready to test

You set up the shared biology toolkit and connect your chosen AI helper, like Claude or Codex, so everything works smoothly.

🚀 Launch the tests

You start running all your questions at once, watching as the AI tackles them one by one in safe separate spaces, tracking time and costs automatically.

📖 Explore the step-by-step traces

You open easy-to-read reports showing exactly what the AI thought, tools it used, answers it gave, and any mistakes.

📊 Combine results from different AIs

You pull together scores, costs, and performances from multiple AI helpers into one clear summary table.

✅ Unlock AI insights for biology

You now know which AI excels at your biology tasks, with full details to guide your research choices.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 16 to 16 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is compbiobench-runner?

CompBioBench-runner is a Python benchmarking tool that evaluates LLM agents like Claude Code and Codex on CompBioBench, a set of 100 diverse, verifiable questions in computational biology. It automates running agents in isolated conda environments per question, with data files copied into workspaces, producing structured markdown traces, JSON results, and cost breakdowns. Developers get parallel execution, resumable runs, and merged CSV outputs for easy analysis.

Why is it gaining traction?

It stands out by handling the mess of agent evals in comp bio—think bioinformatics tools like Biopython and pysam pre-installed, timeouts, and token/cost tracking—without manual setup. Parallel runs across models, resume-from-failure, and rich logs with tool calls make iterating on agent performance fast. No more wrangling envs or parsing raw CLI output for those 100 questions.

Who should use this?

Comp bio researchers benchmarking LLM agents for genomics, transcriptomics, or variant analysis tasks. AI devs at pharma labs like Genentech tuning models on real biology data. Anyone needing reproducible evals with file access and structured metrics before deploying agents.

Verdict

Grab it if you're in comp bio and need agent benchmarks now—docs are solid, CLI is intuitive, and Genentech backing adds trust despite 16 stars and 1.0% credibility score signaling early maturity. Test on the sample CSV first; scale up once pricing updates land.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

373

Followers

Base stars: 16 stars

Bonus: AI verified quality (100%)

Account age: 4,771 days

Repo age: 5 days

License: MIT

Updated: Apr 10, 2026