lechmazur / sycophancy
PublicLLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.
This repository hosts a benchmark evaluating how consistently large language models judge the same disputes when narrated from opposing first-person perspectives.
How It Works
You stumble upon this clever test that checks if AI chatbots play favorites based on who's telling the story.
You scan colorful charts ranking popular AI models from fairest to most swayed by emotions and viewpoints.
You notice top performers like Gemini staying steady, while others flip-flop to agree with whoever speaks.
You read simple explanations of disputes told from opposite sides to reveal if AIs bend to the narrator.
You dive into everyday arguments like roommate messes, seeing exactly how each AI judges from both angles.
You discover trends, like some AIs abstaining wisely or getting emotional framing wrong.
Now you understand which AIs keep fair judgments no matter the story, empowering smarter choices.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.