PKU-YuanGroup

PKU-YuanGroup / TIDE

Public

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

36
2
100% credibility
Found Apr 30, 2026 at 36 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

TIDE is an open-source framework for distilling large diffusion language models across different architectures into compact, efficient versions.

How It Works

1
🔍 Discover TIDE

You find this cool project on GitHub that helps make powerful AI chatbots smaller and faster by learning from big experts.

2
💻 Set up your computer

Follow simple steps to prepare your machine so everything runs smoothly.

3
📥 Grab ready data and experts

Download pre-made conversation data and big AI experts to teach your student.

4
🛠 Prepare your lessons

Quickly organize the data so it's perfect for training.

5
🚀 Train your smart student

Hit start and watch your tiny AI learn tricks from the big experts, getting smarter step by step.

6
📊 Test how great it is

Run quick checks on puzzles and questions to see your AI shine.

🎉 Enjoy your fast AI

Your small, speedy chatbot now matches the big ones and runs anywhere!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 36 to 36 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is TIDE?

TIDE is a Python framework for distilling massive diffusion large language models across different architectures, attention mechanisms, and tokenizers—like squeezing 8B dense or 16B MoE teachers into a 0.6B student. It provides two ready pipelines: one for cross-tokenizer transfers using chunk-level alignment, another for shared-tokenizer setups, complete with scripts for preprocessing data, running distillation, and evaluating on eight benchmarks including HumanEval and GSM8K. Users get pretrained checkpoints and datasets on Hugging Face, plus one-click training on 8 GPUs.

Why is it gaining traction?

Unlike prior diffusion distillation stuck in single architectures, TIDE handles real-world mismatches via modular tricks like progressive scheduling and complementary teacher signals, delivering +1.53 average benchmark gains and crushing code tasks (+16 on HumanEval). It slashes peak memory 22x and speeds inference 5x on H100s, making tiny models viable on consumer hardware. Solid arXiv paper, project page, and bash scripts lower the entry barrier for experiments.

Who should use this?

ML researchers tuning diffusion LLMs for code generation or math reasoning, where distilled models outperform AR baselines. Teams at startups or labs deploying edge-friendly LLMs need its cross-arch flexibility. Academic groups replicating TIDE results or extending to tide model github variants like time series forecasting.

Verdict

Grab it for diffusion distillation prototypes—prebuilt pipelines and HF assets make setup fast despite 36 stars and 1.0% credibility score signaling early maturity. Docs and scripts are crisp, but expect tweaks for production; strong paper backs the gains.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.