TsinghuaC3I

TsinghuaC3I / ZEDA

Public

Post-Trained MoE Can Skip Half Experts via Self-Distillation

19
2
100% credibility
Found May 20, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

ZEDA is a research framework from Tsinghua University that transforms already-trained Mixture-of-Experts AI models into faster, more efficient versions. It works by injecting special 'zero experts' (placeholder components that require no computation) and then training the model to use fewer active experts during inference. The process reduces computational costs by over 50% while maintaining most of the original model's capabilities. The project includes complete training scripts, evaluation tools across math/code/instruction benchmarks, and releases adapted versions of popular models like Qwen3 and GLM.

How It Works

1
💡 Hear about faster AI models

A researcher learns about ZEDA - a technique that can make their existing AI models run 50% faster without losing much accuracy.

2
📚 Understand how it works

They read about 'zero experts' - special placeholder components that let the model skip half its work during inference.

3
🗄️ Gather their materials

They download their trained AI model and prepare 60,000 example prompts for the adaptation process.

4
🔄 Transform the model

They inject zero experts into their model, expanding it into a dynamic version that can choose which parts to activate.

5
🎓 Teach the new approach

The model learns through two stages: first studying example responses, then practicing on its own outputs.

6
📊 Test the results

They run the adapted model through math problems, coding challenges, and instruction-following tests to measure quality.

🚀 Enjoy faster responses

The model now runs about 1.2× faster with over half the computation eliminated, at only a small accuracy cost.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ZEDA?

ZEDA (Zero-Expert Self-Distillation Adaptation) is a Python framework that makes Mixture-of-Experts models cheaper to run without retraining from scratch. It injects "zero-output experts" into existing post-trained MoE models, allowing half the experts to be skipped during inference. The adapted model uses a two-stage self-distillation process, training first on teacher rollouts then on-policy, while a group-level balancing loss keeps routing stable. Built on Megatron-LM and SGLang, it provides Docker setup, model conversion scripts, and evaluation pipelines for math, code, and instruction-following benchmarks.

Why is it gaining traction?

The main appeal is cost reduction without training from scratch. If you already have a MoE model deployed, ZEDA adapts it post-training rather than requiring a full redesign. The claims are concrete: over 50% reduction in expert FLOPs, marginal accuracy loss, and roughly 1.2x end-to-end speedup on Qwen3-30B-A3B and GLM-4.7-Flash. Pre-converted models are available on HuggingFace, so teams can test the tradeoff without training their own.

Who should use this?

ML engineers serving MoE models in production who want inference cost savings without retraining. Researchers working on dynamic routing or model efficiency. Teams using Qwen3-30B-A3B or GLM-4.7-Flash who are comfortable evaluating whether marginal accuracy degradation is acceptable for their use case. Not suitable for teams without existing MoE infrastructure or those needing fully stable, production-tested tooling.

Verdict

The concept is solid and the released models lower the barrier to testing, but the 19 stars and 1.0% credibility score reflect an early-stage academic project. Documentation is adequate, but test coverage and production hardening are unclear. Worth exploring for the specific models and use case, but approach with caution if you need battle-tested reliability.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.