openai / monitorability-evals
PublicOpen-sourced evaluation suite from the Monitoring Monitorability paper
This OpenAI repository open-sources evaluation datasets, prompts, and a mock scaffold for testing AI model monitorability across intervention, process, and outcome archetypes from the Monitoring Monitorability paper.
How It Works
You find this OpenAI collection of ready-made scenarios to check if AI assistants reveal their true thinking through monitors.
You read the simple guide explaining three kinds of tests: ones with hints, step-by-step processes, and outcome checks, drawn from math puzzles and ethical dilemmas.
You unzip the bundled files to access the scenarios, prompts, and answers without needing external sources for most.
You start the easy demo runner which simulates AI responses and monitoring flags, generating sample scores in moments.
You open the output files to see clear summaries, charts, and scores like geometric means showing monitor effectiveness.
Now you can confidently apply these tests to your own AI models to measure how monitorable their inner workings are.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.