chen-hao-chao / mdm-prime-v2

Public

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

chen-hao-chao.github.iomdm-prime-v2 diffusion diffusion-lm llm megatron scaling-laws

100% credibility

Found Mar 20, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

Repository for MDM-Prime-v2, a masked discrete diffusion language model with pretrained weights, Docker demos, and training/evaluation code from a research paper.

How It Works

🔍 Discover a new AI text generator

You find this project through a research paper link or Hugging Face models, curious about smarter ways to complete sentences.

🐳 Grab the ready-to-run demo

Download a simple container image that has everything set up, no hassle needed.

🚀 Start chatting with your AI

Open the web page and type prompts to see the model fill in text creatively between your start and end ideas.

⚙️ Tweak for better results

Play with steps for quality or length to get responses that match what you want.

📥 Download ready models

Grab pretrained brains from a model hub to use in your own apps or experiments.

🎉 Create amazing text magic

Now you have a powerful tool for generating stories, answers, or code completions effortlessly.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is mdm-prime-v2?

MDM-Prime-v2 is a Python implementation for training and evaluating diffusion language models that use binary encoding and index shuffling to enable compute-optimal scaling. It provides pretrained weights on Hugging Face, Docker environments for quick setup, and scripts for pretraining on datasets like SlimPajama or C4, plus zero-shot evaluation on benchmarks like SciQ. Users get a Gradio demo for text infilling and completion, plus sampling tools that run inference with tunable diffusion steps.

Why is it gaining traction?

Unlike autoregressive models, this project delivers diffusion-based language models with better scaling laws through shuffling and encoding tricks, matching or beating baselines on perplexity and downstream tasks. Docker images and pretrained 1.1B models lower the barrier for experimentation, while Weights & Biases integration ensures reproducible runs. Developers appreciate the lit-gpt and Megatron-LM integrations for handling large-scale pretraining without starting from scratch.

Who should use this?

ML researchers scaling non-autoregressive language models beyond masked setups, or engineers benchmarking diffusion LMs on web-scale data like SlimPajama. It's ideal for academics replicating the paper's scaling analysis on C4, or teams fine-tuning for infilling tasks like few-shot Q&A on GSM8K.

Verdict

Try it if you're into diffusion models—pretrained weights and Docker make prototyping fast, despite low maturity (19 stars, 1.0% credibility). Docs are paper-focused but solid for reproduction; expect tweaks for production.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 19 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 2,808 days

Repo age: 2 days

Updated: Mar 20, 2026