0xSero

0xSero / reap-mlx

Public

REAP expert pruning for MoE LLMs on Apple Silicon via MLX

36
1
100% credibility
Found Mar 13, 2026 at 35 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
TypeScript
AI Summary

This tool helps shrink large AI models for Apple Silicon by analyzing usage patterns from sample texts and removing underused components.

How It Works

1
🖥️ Discover a way to slim down your AI model

You hear about a simple tool that trims unused parts from your AI model to make it faster on your Apple Mac.

2
📥 Get your model and sample stories ready

You grab your AI model file and a few short stories or texts to test how it works.

3
🔍 Watch how your model thinks

You run a quick check to see which parts of the model get used the most during those stories.

4
📋 Build a safe trimming plan

The tool creates a smart list of the least-used pieces to remove without hurting performance.

5
Create your lighter model

With one command, you apply the plan and get a new, smaller version of your model ready to use.

🚀 Enjoy faster AI on your Mac

Your slimmed model loads quicker, uses less space, and runs smoothly on everyday tasks.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 35 to 36 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is reap-mlx?

reap-mlx ports Cerebras REAP from GitHub to TypeScript CLI workflows via MLX on Apple Silicon, letting you prune Mixture-of-Experts (MoE) LLMs locally. Feed it prompts, Hugging Face datasets, or JSONL files to collect per-expert telemetry like router weights and activation norms, then generate a pruning plan with REAP or baselines (frequency, EAN sum/mean, weighted EAN). Run `collect`, `run`, `apply`, or `full` to output slimmer MLX checkpoints, with parity checks for exact reproducibility.

Why is it gaining traction?

It brings "reap the experts" one-shot MoE compression to noise REAP GitHub methods on M-series Macs without Cerebras hardware or cloud runs, handling quantized models and low-memory modes like layer-wise reloading. Devs dig the dataset packing, token chunking, and chat template support for real calibration data, plus dry-runs and observation logs for safe iteration.

Who should use this?

MLX users on Apple Silicon tuning expert-pruned MoE LLMs like Qwen1.5-MoE, especially researchers validating "reap the experts why pruning prevails" or devs shrinking 4-bit checkpoints for local inference. Ideal for prompt/dataset calibration before evals, skipping full Cerebras stacks.

Verdict

Grab it if you're on Apple Silicon doing MoE pruning—solid CLI and docs make it dead simple despite 29 stars and 1.0% credibility score. Early-stage with good tests, but add benchmarks for production confidence.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.