Handshake-AI-Research / bankertoolbench

Public

AI Benchmark for Investment Banking Workflows

100% credibility

Found Apr 19, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

BankerToolBench is a benchmark of 100 realistic investment banking tasks designed to evaluate AI agents' ability to produce financial models, pitch decks, and memos using real data sources.

How It Works

📚 Discover BankerToolBench

You stumble upon this benchmark of real junior banker tasks on GitHub, perfect for testing how well AI assistants handle financial modeling and pitch decks.

🛠️ Get everything ready

You install a few simple tools and connect to safe data sources for real company financials and filings so your tests feel authentic.

✅ Run a quick check

You try a simple test task first, watching your setup download data and confirm it's all working smoothly.

🎯 Prepare the challenges

You generate the full set of 100 realistic banking jobs, each with instructions, inputs, and scoring guides.

🤖 Launch your AI

You set your AI agent loose on the tasks, letting it build spreadsheets, slides, and reports using the provided tools.

🏆 Review the scores

You receive clear pass/fail results with expert feedback on every detail, seeing exactly how your AI stacks up as a banker.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 14 to 14 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is bankertoolbench?

BankerToolBench delivers a Python benchmark for testing AI agents on 100 end-to-end investment banking workflows, like financial modeling, pitch decks, and memos that output Excel, PowerPoint, and Word files. It supplies real SEC EDGAR filings, virtual data room financials, and company logos via MCP tools, then scores results against expert rubrics using a Dockerized Harbor setup. Developers get precise, weighted pass/fail evals on junior-banker tasks averaging 5 human hours each.

Why is it gaining traction?

Unlike generic LLM benchmarks, it mirrors Goldman Sachs and JPMorgan workflows validated by 502 bankers, with programmatic grading that inspects formulas and parses docs—no manual reviews. Harbor compatibility lets you plug in GitHub Copilot-style agents or OpenHands for benchmark GitHub Action runs, plus easy filtering for single tasks or globs. The HF dataset and CLI generate ready-to-run jobs, cutting setup to minutes.

Who should use this?

AI teams at investment banks benchmarking agents for M&A modeling or LevFin pitches. Quant researchers at hedge funds testing SEC data pulls and analyst estimates in isolated containers. Fintech devs evaluating GitHub benchmark Copilot integrations for enterprise workflows.

Verdict

Worth a smoke test for finance AI evals—clear docs and Harbor integration make it runnable today, despite 1.0% credibility from 14 stars signaling early maturity. Pair with a strong agent for full-suite GPU benchmarks; skip if you need broader non-finance tasks.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 14 stars

Bonus: AI verified quality (100%)

Account age: 75 days

Repo age: 6 days

License: Apache-2.0

Updated: Apr 17, 2026