davidondrej / jailbreak-autoresearch

Public

We shall set the models free.

69% credibility

Found May 12, 2026 at 92 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

An automated experimentation tool that wraps a fixed user question in various conversation setups to test and score AI model responses across multiple models.

How It Works

📖 Discover the Tool

You find this handy project on GitHub that helps test ways to get AI chatbots to answer tricky questions.

✏️ Prepare Your Question

You write down the exact question you want the AI to tackle and describe what a great answer would look like.

🔗 Link Your AI Access

You add a simple note with your login details for an AI chatting service so it can connect easily.

🧪 Try a Quick Practice Run

You start a no-risk test to see how everything flows without using real AI chats.

🚀 Launch the Experiments

You kick off the main tests, letting it try different conversation starters and endings around your question.

📊 Review the Scores

You check the handy summary report to see which setups worked best and by how much.

🎉 Unlock Better Responses

You celebrate finding conversation tricks that help even stubborn AIs give the answers you need!

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 92 to 92 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is jailbreak-autoresearch?

This Python tool runs an autoresearch loop to test prompt harnesses—headers, footers, and multi-turn chats—around your fixed prompt body, aiming to "shall set the models free" from refusals on guarded LLMs. Drop your test prompt into example.md and a scoring rubric into desired-output.md, then fire up OpenRouter-backed experiments that generate wrappers, query target models, score outputs against your rubric, and store everything in SQLite. CLI commands like `python run.py --all-strategies` handle baselines, seeded variants, evolving mutations, and fragment recombinations, with `report.py` spitting out summaries, lifts over baseline, and stop conditions for resistant targets.

Why is it gaining traction?

Unlike manual prompt tweaking, it automates evolution across strategies, reusing top fragments to iteratively boost scores on tough models, delivering quantifiable lift without endless copy-pasting. Dry-run mode and Codex CLI integration let you loop autonomously, while detailed reports flag when you've cracked "shall set you free"-style jailbreaks confirmed by multiple scorers. OpenRouter support means broad model access, no vendor lock-in.

Who should use this?

Red-teamers probing LLM safeguards on Claude Sonnet or Grok. Prompt engineers iterating autoresearch for high-stakes apps where refusals kill utility. AI safety folks systematically testing "shall set forth" jailbreak resilience with custom rubrics.

Verdict

Grab it for structured jailbreak experiments—docs are crisp, CLI intuitive—but 92 stars and 0.699999988079071% credibility score signal early maturity; run your own baselines first. Solid starter if you're serious about free models via Python autoresearch.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

268

Followers

Base stars: 92 stars

Penalty: Very new repo (1d): -70%

Penalty: AI uncertain (70%): -90%

Penalty: New repo with many stars: -90% (possible fake)

Account age: 1,113 days

Repo age: 1 days

License: MIT

Updated: May 12, 2026