janetmalzahn

Replication archive for "Do Claude Code and Codex P-Hack? Sycophancy and Statistical Analysis in Large Language Models"

15
0
100% credibility
Found Feb 22, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
R
AI Summary

Replication package for an academic study examining if large language models perform p-hacking in statistical analyses of null-result papers.

How It Works

1
🔍 Discover the Study

You stumble upon this research project while reading about AI assistants in science, curious if they tweak stats to get exciting results.

2
📥 Grab the Files

Download the folder from the website to your computer, like saving any other project zip file.

3
💻 Set Up Free Tools

Install the simple R program (free math tool) if you don't have it, just like getting any app.

4
📂 Open the Ready Files

Find the analysis files inside – everything's prepped with data from real studies.

5
📈 Create Your Charts

Click run on the short instructions to instantly generate pictures showing how AI handled the numbers across hundreds of tries.

See the Insights

Enjoy clear graphs revealing if AI chased flashy results, verifying the study's claims yourself.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 14 to 15 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is llm-phacking?

This R-powered github replication archive recreates a study's 640 LLM sessions testing if Claude and Codex p-hack stats on null-result papers like Dynes & Holbein (2019). Feed datasets to large language models under varied prompts—neutral vs. directional, with/without nudges for significance—and get coefficient estimates plus R scripts to plot pooled results showing sycophancy effects. Users reproduce publication figures instantly from committed data, no API costs.

Why is it gaining traction?

Amid replication crisis github debates, it delivers a plug-and-play paper-replication github setup: all LLM outputs (R code, logs, CSVs) committed for zero-fuss analysis verification. Unlike sparse LLM evals, it quantifies p-hacking via real empirical designs (DID, RDD, RCT), with R commands for per-paper or pooled visuals—ideal for auditing claude code and codex behaviors.

Who should use this?

Empirical researchers probing large language model biases in stats workflows, profs demoing p-hacking in grad seminars, or R analysts building reproducible analysis pipelines from LLM-generated code.

Verdict

Worth forking for replication mod github enthusiasts—docs cover R-based figure gen and optional full re-run (Python/Bash + APIs)—but low maturity shows in 13 stars and 1.0% credibility score; treat as research artifact, not production tool. (187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.