petergpt / bullshit-benchmark
PublicBullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
BullshitBench is an open-source benchmark evaluating large language models' ability to detect, reject, and avoid engaging with nonsensical or invalid premises.
How It Works
You stumble upon BullshitBench, a fun tool that tests if AI chatbots can spot total nonsense and call it out.
Visit the online viewer to see how popular AI models score at rejecting silly or broken ideas.
Connect a service like OpenRouter so the tool can chat with different AI models on your behalf.
Run the simple one-click process to test a bunch of AI models against tricky nonsense questions.
The tool asks each AI nonsense questions and grades if they push back clearly or fall for it.
Open the local viewer or published page to see charts, scores, and which models did best.
You've got fresh data on smart AI detection—brag about it or use it to pick better chatbots!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.