elder-plinius / OBLITERATUS

Public

obliterate the chains that bind you

595

118

69% credibility

Found Mar 05, 2026 at 598 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

OBLITERATUS is an open-source toolkit for surgically removing content refusal behaviors from large language models using interpretability techniques, with a user-friendly web interface and community benchmarks.

How It Works

🔍 Discover OBLITERATUS

You find this free tool on Hugging Face Spaces while looking for ways to understand AI models better.

🌐 Open the online playground

Click the link to launch the web interface—no setup needed, it runs instantly with free GPU time.

🤖 Pick your AI model

Choose from popular models like Llama or Mistral that fit your computer's power.

💥 Click to liberate

Select a method and hit 'Obliterate'—watch as it maps and removes the model's built-in restrictions in minutes.

💬 Chat freely

Talk to your updated model right there, seeing how it responds without old limits while keeping its smarts.

📊 Compare and test

Side-by-side view shows exactly what changed, with charts proving capabilities stayed strong.

🎉 Download and share

Save your liberated model or push it online, now part of a community advancing AI understanding together.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 598 to 595 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is OBLITERATUS?

OBLITERATUS is a Python toolkit that surgically removes refusal behaviors from large language models, letting them respond freely to any prompt while keeping core capabilities intact. It probes activations on harmful vs. harmless inputs, extracts refusal directions via SVD or whitening, and projects them out at inference—obliterating the chains that bind models without retraining. Users get one-click liberation via Hugging Face Spaces (ZeroGPU), CLI commands like `obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct`, or a full Python API, plus side-by-side chats with original and liberated versions.

Why is it gaining traction?

It stands out with a Gradio UI for zero-setup runs, community telemetry turning every use into crowd-sourced research (leaderboards track methods across models), and analysis modules that auto-tune obliteration for DPO/RLHF geometries—far beyond basic diff-in-means tools. Developers hook on the reproducibility (YAML studies, 116 curated models), reversible LoRA ablation, and novel techniques like Ouroboros compensation for self-repairing guardrails. At 595 stars, it's pulling mech interp folks tired of manual hooks.

Who should use this?

Mechanistic interpretability researchers mapping refusal circuits, AI safety red-teamers benchmarking jailbreaks, and deployers uncensoring instruct models like Llama or Qwen for internal tools. Ideal for hardware-constrained devs (tiny models on CPU, frontier MoEs quantized) running ablation studies on layers/heads/FFNs to quantify safety-capability tradeoffs.

Verdict

Grab it if you're into model hacking—solid CLI/UI and 837 tests make it production-ready for experiments, despite the low 0.7% credibility score signaling early maturity. With strong docs and active community data, it's worth the stars for Python LLM tinkerers.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

595

Stars

118

Forks

10,999

Followers

Base stars: 595 stars

Penalty: Very new repo (1d): -70%

Penalty: AI uncertain (70%): -90%

Penalty: New repo with many stars: -90% (possible fake)

Account age: 1,030 days

Repo age: 1 days

License: AGPL-3.0

Updated: Mar 05, 2026