An automated evaluation tool that tests AI language models on 400 professional-domain questions using weighted rubrics, judge models, concurrent processing, cost tracking, and Excel/JSON reports.
How It Works
You hear about a tool that tests how well AI experts handle real-world questions in medicine, finance, law, engineering, and science.
Download the simple program and the set of 400 challenging questions with scoring guides.
Connect a few AI thinking services so they can answer questions and judge responses.
Pick which smart AIs to evaluate and which ones will check the answers fairly.
Hit go, and watch as answers get created, scored automatically, and costs tracked in real-time.
Open beautiful spreadsheets and summaries showing scores, strengths, and exact costs for each AI.
See clear rankings to choose the best AI for your professional needs, with full details saved.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.