Run a 35B MoE model at 10+ tok/s on a $600 Mac mini. Pure C/Metal inference engine streaming experts from SSD on Apple Silicon
An optimized engine for running large Mixture-of-Experts AI language models on Apple Silicon Macs by streaming specialized model parts from disk to enable high performance on low-memory hardware.
How It Works
You hear about a way to run powerful AI chatbots super fast on everyday Apple computers like a Mac mini, even with limited memory.
You download the free AI model files from a shared online library to your computer.
You use simple tools to organize and ready the model pieces for quick loading from your hard drive.
With one easy build step, you create the special runner that makes everything work smoothly on your Mac's chip.
You start the server, and your AI is ready to chat over the web right from your machine.
You send messages and get smart, helpful replies in seconds, feeling the thrill of 11 tokens per second on affordable hardware.
Now you have a blazing-fast personal AI assistant running production-quality chats on your Mac, saving time and money.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.