QingGo

QingGo / engram-peft

Public

🚀 Engram-PEFT: An unofficial implementation of DeepSeek Engram. Inject high-capacity conditional memory into LLMs via sparse retrieval PEFT without increasing inference FLOPs / DeepSeek Engram 架构的非官方实现。通过参数高效微调 (PEFT) 为大语言模型注入超大规模条件记忆,支持稀疏更新且不增加推理开销。

17
5
89% credibility
Found Apr 16, 2026 at 17 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Engram-PEFT is an open-source Python library that implements a parameter-efficient method to add scalable, sparse memory retrieval to transformer language models, closely following the DeepSeek Engram research paper.

How It Works

1
🔍 Discover Engram-PEFT

You hear about a clever way to give AI models a huge boost in remembering facts without slowing them down.

2
📦 Get the tool

You easily add this memory enhancer to your setup with a quick download.

3
🧠 Choose your AI base

You pick a ready-made AI brain, like a small language helper, to start with.

4
💡 Add memory magic

You sprinkle in special memory layers at key spots so the AI can pull facts on demand.

5
🏋️ Teach it new things

You show the AI examples and let it learn, focusing just on building its memory or everything.

🎉 Smarter AI ready

Your AI now recalls tons of info lightning-fast, performs better, and you're thrilled with the results.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 17 to 17 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is engram-peft?

Engram-PEFT is a Python library that injects high-capacity conditional memory into any Transformer-based LLM using sparse retrieval PEFT, mirroring DeepSeek Engram without increasing inference FLOPs. It decouples static knowledge storage from dynamic reasoning, letting you scale factual recall in models like Llama or Qwen via a simple PEFT-style API. Load a base model, pick target layers, and train—your LLM gains massive memory (e.g., 545M params) with near-zero runtime overhead.

Why is it gaining traction?

It packs 240x more trainable params than LoRA for knowledge storage, yet runs only 8% slower, and LoRA+Engram combos beat full finetuning on eval loss. Standout user perks include CPU-prefetched hashing for zero GPU stalls, 23% tokenizer compression for efficiency, and cross-model weight migration to recycle learned memory between architectures. Docs cover tutorials, API refs, and paper alignment, making it dead simple to benchmark against baselines.

Who should use this?

ML engineers fine-tuning LLMs for RAG-heavy apps or domain-specific knowledge injection, like legal docs or codebases. Researchers replicating DeepSeek Engram experiments on custom datasets. Teams stacking it with LoRA for hybrid structural+memory adaptation without refactoring pipelines.

Verdict

Grab it if you're experimenting with memory-augmented LLMs—solid 0.90 credibility score, Apache 2.0 license, and MkDocs docs make it production-ready despite 17 stars signaling early maturity. Pair with Accelerate for distributed training; expect tweaks as adoption grows.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.