alif-munim

alif-munim / autosae

Public

Adapting Karpathy's autoresearch to train SAEs on language models.

19
2
100% credibility
Found Mar 14, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

AutoSAE is an open-source experiment demonstrating an AI agent autonomously iterating on sparse autoencoder designs to achieve high-fidelity reconstruction of Gemma 3 1B model activations.

How It Works

1
📖 Discover AutoSAE

You stumble upon AutoSAE, a clever experiment where an AI guides itself to build better tools for peeking inside language models' thoughts.

2
🛠️ Get set up

You grab the one essential helper tool to start working with AI inner signals.

3
💾 Gather signals

You collect and save the language model's hidden activity patterns from sample texts, ready for analysis.

4
Train detectors

You launch the training, letting it create smart detectors that capture what the model notices in text, running fast on your computer's power.

5
📊 Check progress

You create charts showing how much better the detectors get with each try, from weak to almost perfect.

6
🔍 Explore findings

You peek at what each detector lights up for, seeing patterns like topics or ideas in the text.

🎉 Unlock insights

You now have powerful detectors revealing nearly all the model's thinking patterns, ready for deeper AI understanding.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is autosae?

Autosae adapts Karpathy's autoresearch concept to autonomously train sparse autoencoders (SAEs) on language models like Gemma 3 1B, iteratively boosting loss recovery from 0.68 to 0.96 over 48 GPU experiments. Python scripts let you cache activations from specific layers, train boosted BatchTopK SAEs with soft whitening via simple CLI commands like `python train_sae.py --gpu 0 --k 70`, evaluate metrics like variance explained and L0, plot progress, and visualize top features in context. It solves the hassle of hand-tuning SAEs for mechanistic interpretability, delivering 100% alive features in 5-minute single-GPU runs.

Why is it gaining traction?

Unlike manual hyperparameter grinds or basic SAE libs, autosae packages battle-tested configs from an AI-driven research loop, hitting top metrics (0.96 recovered loss, 0.987 variance) without you iterating. CLI sweeps across GPUs, one-command eval on held-out texts, and feature viz tools make prototyping fast—train, plot, inspect in minutes. Devs dig the reproducible high-perf baselines on real LM activations.

Who should use this?

Mech interp researchers dissecting Gemma layers for features like "capital cities" or residuals. Interpretability devs needing quick SAEs to hook into evals or dashboards. Teams training Python SAEs on custom activations without full autoresearch setups.

Verdict

Grab autosae if you're prototyping SAEs—docs guide prep-to-viz, results impress despite 19 stars and 1.0% credibility signaling early maturity. Solid for experiments, but productionize your own evals first.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.