romovpa

romovpa / claudini

Public

Autoresearch for LLM adversarial attacks

29
2
100% credibility
Found Mar 27, 2026 at 29 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Claudini is a framework for benchmarking and automatically discovering advanced adversarial attacks on large language models using token optimization techniques.

How It Works

1
🔍 Discover Claudini

You stumble upon this project while reading about clever ways AI can find tricks to bypass language model safeguards.

2
📥 Grab the code

Download the ready-to-use files and prepare your computer with simple instructions.

3
⚙️ Test known tricks

Run quick checks on existing methods to see how well they fool different AI models.

4
🤖 Unleash AI discovery

Connect to an AI assistant that studies results, invents new optimization tricks, and improves them step by step.

5
📊 Review improvements

Watch graphs show how new tricks outperform the old ones on speed and success.

🏆 New best attacks found

Celebrate as your AI partner uncovers state-of-the-art ways to test AI vulnerabilities.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 29 to 29 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is claudini?

Claudini automates research for LLM adversarial attacks, using an autoresearch loop powered by Claude AI to discover white-box token optimizers that jailbreak models better than baselines. In Python with PyTorch and Transformers, it benchmarks attacks on loss-vs-FLOPs curves for random targets or prompt injection, via a simple CLI like `uv run claudini.run_bench random_valid`. You get precomputed results, evolved methods from Claude runs, and tools to run your own autoresearch on GitHub.

Why is it gaining traction?

It outperforms GCG, ACG, and MAC baselines in jailbreaking evals, thanks to Claude-discovered optimizers like claude_random variants that hit SOTA Pareto fronts under 1e15 FLOPs. The autoresearch github skill integrates directly with Claude.ai for hands-off evolution, committing improved attacks as git branches—perfect for iterative red-teaming without manual tuning.

Who should use this?

LLM safety engineers testing jailbreak robustness on models like Qwen or Llama, red-teamers comparing adversarial attacks across FLOPs budgets, or researchers replicating arXiv:2603.24511 for custom autoresearch on claude ai prompts.

Verdict

Grab it if you're in LLM adversarial research—solid CLI, paper-backed evals, and auto research github workflow shine despite 29 stars and 1.0% credibility. Still early; run your own benchmarks to validate before production.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.