lechmazur / position_bias
PublicA benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders.
This repository shares a benchmark dataset and analysis revealing how large language models exhibit position bias when judging pairwise story variants in swapped orders.
How It Works
You stumble upon this GitHub page while searching for ways AI models might unfairly judge stories based on their position.
You dive into the main page to learn how swapping the order of two similar stories reveals if AI judgments stay consistent.
You check the rankings to see which AI models resist changing their picks when the story order flips, with clear winners at the top.
You look at colorful graphs showing flip rates, biases, and how ratings shift, making the patterns easy to grasp.
You open detailed case studies, like the midnight bakery story, to see exactly how different AIs react in swapped views.
You browse the shared files listing prompts, answers, and results to verify or dig deeper yourself.
You now understand position bias in AI judging, helping you choose more reliable models or design fairer evaluations.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.