g-luo

Official PyTorch Implementation for Learning a Generative Meta-Model of LLM Activations

59
6
100% credibility
Found Feb 10, 2026 at 25 stars 2x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Jupyter Notebook
AI Summary

This repository implements a method to train generative models of large language model activations for applications like behavior steering and feature probing.

How It Works

1
🔍 Discover the idea

You stumble upon a clever way to guide and understand how big AI brains think, shared by smart researchers.

2
🛠️ Get your space ready

You follow easy instructions to set up a cozy workspace on your computer for playing with these AI guides.

3
📥 Grab ready-made guides

You download pre-made AI thinking patterns that fit right into popular language models.

4
Try the fun demo

You open a simple guidebook and watch as it generates realistic AI thoughts and steers behaviors smoothly.

5
🎛️ Experiment with steering

You tweak AI personalities to make them more helpful or explore hidden patterns in their thinking.

🎉 Achieve smooth control

Your AI now thinks more naturally and predictably, opening doors to creative new uses.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 25 to 59 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is generative_latent_prior?

This PyTorch repo delivers the official implementation for training generative diffusion models on LLM activations, modeling their natural manifold. Load pre-trained weights from HuggingFace for Llama 1B/8B layers, generate realistic activations, or project edits back on-manifold via simple API calls like `load_glp("generative-latent-prior/glp-llama8b-d6")`. It tackles off-manifold activation steering that breaks LLM behavior, enabling cleaner interventions on activations from official PyTorch tutorials or datasets.

Why is it gaining traction?

Pre-trained models fit on a single RTX 4090 with under 24GB VRAM, and toy training runs in minutes on 1M activations—far easier than full diffusion setups. Built-in scripts handle scalar probing across 113 datasets and on-manifold post-processing for persona vectors, with seamless HuggingFace integration like official PyTorch Docker images. Developers grab it for quick experiments without activation caching hassles.

Who should use this?

Interpretability researchers probing LLM layers for features, safety engineers refining activation steering beyond naive additions, or anyone analyzing activations from Llama models via official PyTorch get started page flows. Ideal for steering experiments with persona vectors or evaluating generated activations against real ones.

Verdict

Promising for activation work, with strong README quickstarts, demo notebooks, and HF weights, but early maturity shows in 25 stars and 1.0% credibility score—test on small datasets first. Worth forking if you're in LLM internals.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.