david3684

david3684 / flm

Public

Official Codebase For paper "One-step Language Modeling via Continuous Denoising"

48
0
100% credibility
Found Feb 19, 2026 at 19 stars 3x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Research codebase for training Flow Language Models that generate coherent text sequences in a single parallel step using continuous normalizing flows on discrete token spaces.

How It Works

1
🔍 Discover fast text magic

You stumble upon this project promising to create full sentences in one quick step, like magic for stories or chats.

2
💻 Set up your playground

Follow simple steps to prepare your computer so it can handle the creative heavy lifting.

3
📋 Pick a recipe

Choose from ready-made plans for different text styles, like news or books, to match what you want to create.

4
🚀 Start the creation engine

Hit go, and watch your computer learn patterns from tons of example texts to become a text wizard.

5
📊 Check the magic growing

Peek at updates showing how well it's learning, feeling excited as numbers improve.

6
Make your first texts

Ask it to whip up new sentences or stories, seeing fresh words appear instantly.

🎉 Your text generator shines

Now you have a speedy tool for endless creative writing, experiments, or fun chats!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 48 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is flm?

FLM lets you train flow-based language models that generate text in one parallel step by continuously denoising one-hot encoded sequences, sidestepping the slow multi-step sampling of discrete diffusion. It solves the correlation bottleneck in few-step language generation, producing coherent output via flow matching from noise to data. Python-based with PyTorch, it ships scripts for training on LM1B or OpenWebText, distillation for speedups, and perplexity evals—plug in your dataset dir and run.

Why is it gaining traction?

Stands out with one-step sampling that's viable for real inference, plus flow-map distillation yielding even tighter models without quality loss. Developers hook on the official GitHub repository's Hydra configs for baselines like AR, DUO, MDLM, SEDD, and torch.compile + flash-attn for H100-scale efficiency. No more wrestling custom schedulers; it's the flammkuchen for flow LMs from KAIST/CMU researchers.

Who should use this?

NLP researchers replicating "One-step Language Modeling via Continuous Denoising" or benchmarking continuous flows vs diffusion on wikitext/PTB. Teams at flm gmbh-style outfits prototyping non-autoregressive generators for chat or code, especially if scaling to OpenWebText without AR bottlenecks.

Verdict

Worth forking for flow LM experiments—distillation delivers practical one-step gen today. 1.0% credibility flags 19 stars and thin docs, but scripts and configs bootstrap fast; contribute tests to mature it.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.