Anemll / anemll-flash-llama.cpp
PublicFlash-MoE sidecar slot-bank runtime for large GGUF MoE models on Apple Silicon — llama.cpp fork
A fork of llama.cpp optimized for running large Mixture-of-Experts AI models on Apple Silicon by streaming experts from disk.
How It Works
You learn about a free tool that lets everyday Macs run massive smart models without needing endless memory by smartly loading only needed parts.
Grab the program and pick a big AI model file that fits your computer's power.
Run a simple setup to split the model's brain into quick-access pieces stored on your drive.
Launch the tool with your Mac's settings and watch it think fast, pulling just what it needs from storage.
Type your questions and see detailed, speedy responses from models too big for normal memory.
Your Mac now handles enormous AIs effortlessly, giving you expert help anytime without slowdowns.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.