ytang928 / BrainBench
PublicBrainBench: A 100-question benchmark exposing commonsense reasoning gaps in LLMs across 20 failure categories. Includes English and Chinese datasets, evaluation code, and results for 8 frontier models.
BrainBench is a dataset of brainteaser questions and evaluation tools to benchmark large language models' commonsense reasoning abilities against human-level performance.
How It Works
You stumble upon BrainBench, a collection of tricky brainteasers that test if AI can think like humans on everyday puzzles.
Download the ready-made set of 100 clever questions and their correct answers to challenge any AI.
Link up your favorite AI chat services, like ChatGPT or Claude, so they can join the puzzle challenge.
Pick which AIs to test and start running them through the brainteasers, watching progress as they respond.
Give it a little time while the AIs tackle multiple rounds of each puzzle for fair scoring.
Get colorful charts, accuracy scores, and breakdowns showing which AI solved the most puzzles correctly.
Celebrate understanding AI's reasoning strengths and blind spots, ready to discuss or use in your projects.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.