lucidrains / fast-weight-attention

Public

Implementation of Fast Weight Attention

artificial-intelligence deep-learning fast-weights memory

69% credibility

Found Mar 26, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

This repository provides a PyTorch implementation of Fast Weight Attention, an attention-based fast weight episodic memory similar to memory MLPs from TTT/Titans and fast weight product key memory, including causal and chunked processing variants.

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is fast-weight-attention?

This Python library delivers a fast weight attention mechanism, blending attention with episodic memory updates for PyTorch models. It processes sequences token-by-token, carrying fast weight memories across chunks to handle long contexts without quadratic costs—ideal for causal autoregressive tasks like language modeling. Drop it in via pip install fast-weight-attention, feed it tensors with optional past memories, and get outputs matching input shapes, plus next-memory states for streaming.

Why is it gaining traction?

Unlike standard transformers bogged down by full attention matrices, this offers a linear attention fast weight approach that updates weights dynamically per token, inspired by recent papers on fast-weight product key memory and long-context reconstruction. Developers dig the chunked processing for arbitrary sequence lengths, muon-style optimizers for stable training, and gates/forget mechanisms that boost accuracy on memory-intensive toys like repeating sequences. It's a lightweight swap yielding better long-term recall without custom infra.

Who should use this?

ML engineers prototyping memory-augmented LLMs or sequence models needing cross-chunk recall, like synthetic data generators or extended-context predictors. Researchers tweaking attention variants for papers on fast weight attention or predictive coding will find the toy training script handy for baselines. Avoid if you're locked into production-scale transformers without PyTorch flexibility.

Verdict

Early-stage gem at 19 stars and 0.70% credibility score—solid README examples and pip-ready, but light on tests and real benchmarks means test rigorously before committing. Grab it for experiments if fast weight attention hooks you; skip for battle-tested alternatives.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

58,919

Followers

Base stars: 19 stars

Penalty: Very new repo (1d): -70%

Penalty: AI uncertain (70%): -90%

Account age: 6,088 days

Repo age: 1 days

License: MIT

Updated: Mar 26, 2026