mattc95 / 2026-AI-DETECTOR-BENCHMARK
PublicBenchmarking AI text detectors (GPTHumanizer, GPTZero, ZeroGPT, Sapling) across multiple datasets to evaluate accuracy, human false positive rates, and risk trade-offs.
This is a research project that compares four AI text detection tools by testing them on 1,000 text samples (500 human-written, 500 AI-generated). The benchmark measures how accurately each tool identifies AI content, but also tracks how often each tool wrongly flags real human writing as AI-generated. The project includes detailed results broken down by text length, source type, and AI model, along with full access to the test data and individual results so anyone can verify the findings. The key insight is that catching AI text and protecting human writers are different goals, and some tools excel at one while struggling with the other.
How It Works
You've heard about tools that claim to tell whether something was written by a human or by AI, and you're curious how well they actually work.
You come across a project that tested four different AI detection tools on 1,000 text samples, with half written by humans and half generated by AI.
The benchmark shows you exactly how often each tool correctly identifies AI text and, more importantly, how often it wrongly accuses real human writing of being AI.
You learn that catching AI text and protecting human writers are two different goals, and some tools are better at one than the other.
You focus on the false positive rates and learn which tool is safest for writers who might be wrongly accused.
You focus on overall accuracy and AI detection rates to find the most sensitive tool.
You download the full test data and outputs to check the findings independently.
You discover that shorter texts are harder for all tools to classify, which helps you understand when to trust the results.
You now understand the real strengths and weaknesses of AI detectors, so you can use them responsibly or decide when not to rely on them at all.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.