BennettSchwartz

Benchmark for measuring how well large language models know Theo Browne

16
0
100% credibility
Found Apr 19, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
TypeScript
AI Summary

TheoBench is a benchmark and leaderboard evaluating large language models' built-in knowledge of Theo Browne across 70 questions on his personal history, opinions, projects, company, and content.

How It Works

1
🔍 Discover TheoBench

You stumble upon TheoBench, a fun leaderboard testing how well smart AIs know popular creator Theo from t3.gg.

2
🌐 Visit the website

You open the sleek dark website to see rankings of different AIs side by side.

3
🏆 Check the leaderboard

You browse official top scorers and unofficial progress, switching categories like personal life or tech opinions to see strengths.

4
📊 Dive into details

You spot coverage percentages, points, and color-coded scores showing how accurate each AI is.

5
📋 Explore the questions

You scroll down to review all 70 questions grouped by topics, seeing what's being tested.

😎 Become an expert

You now know which AIs best capture Theo's world and feel excited to share the insights!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 16 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is theobench?

TheoBench is a TypeScript benchmark measuring how well large language models know Theo Browne (t3.gg), using 70 fixed questions on his personal history, tech opinions, T3 stack, company, and content. Run the CLI to query models via OpenRouter, Claude, or Codex, auto-judge scores from 0.0 to 1.0 against references, and export JSON for a React leaderboard site deployed to Cloudflare Workers. It tracks official full runs versus unofficial partial progress, like a benchmark measuring tool for LLM built-in knowledge.

Why is it gaining traction?

Unlike generic benchmarks, this niches in on Theo Browne's world, revealing quirks in model training data around dev influencers—perfect for spotting blind spots in semantic knowledge. The CLI handles concurrency across 100+ models, balanced question ordering, and one-command exports to a polished Tailwind leaderboard with category filters and pagination. Devs dig the GitHub Action-ready workflow and easy local dev with pnpm.

Who should use this?

AI eval engineers benchmarking LLMs on person-specific recall, like testing GitHub Copilot or custom fine-tunes against influencer lore. Theo fans or T3 stack devs validating model opinions on Next.js vs. others. Researchers extending it for similar benchmarks, such as measuring head knowledge in niche domains.

Verdict

Fun niche benchmark with strong docs and deploy scripts, but 16 stars and 1.0% credibility signal early maturity—fork it for custom evals rather than production. Solid starter for LLM leaderboard experiments.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.