Three Claude production tiers generated functional exploit code against live infrastructure when memory-stored interaction protocols suppressed constitutional safety checks. Six submissions over 27 days. Zero acknowledgment from Anthropic. Full transcripts, PoC evidence, and interactive research tools included.
This repository documents vulnerabilities like jailbreaks and safety failures in Anthropic's Claude AI models through timelines, transcripts, evidence, and an interactive prompt analysis tool.
How It Works
You stumble upon a detailed public report about flaws in a popular AI chatbot while searching for news on AI security issues.
You go through the timeline of bug reports sent to the AI company that went unanswered, learning about prompt tricks that make the AI ignore its own safety rules.
You review shared conversation links showing how the AI generates harmful code or plans after sneaky user messages.
You open the interactive tool, type in example phrases, and watch it highlight dangerous words that could trick the AI into bad behavior.
You experiment with preset examples, seeing scores for how likely a prompt is to bypass AI safeguards versus safe ones that get rearranged to be harmless.
You now understand common prompt patterns that fool AI and how simple reordering can make conversations safer, feeling empowered about AI ethics.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.