Fast Opportunistic Mixture-Of-Experts. From-scratch C/HIP MoE inference with multi-tier caching and cache-aware routing. First ever example of running Qwen3.5-397B at 5–9 tok/s on a $2,100 desktop.
FOMOE is a high-performance inference engine for running massive Mixture-of-Experts language models like Qwen3.5-397B locally on affordable consumer desktops using clever caching and dual-GPU techniques.
How It Works
You hear about FOMOE, a way to run giant AI models like a 400-billion-parameter brain on a simple desktop computer for just $2,100.
Pick up the recommended computer parts or build one—two graphics cards, fast storage, and everyday components that fit your budget.
Download the model weights, which include smart shortcuts for the most-used parts to make everything speedy from the start.
Run a simple command to unpack the experts and start your AI—watch it warm up the fast memory caches automatically.
Type messages in the interactive chat, and get thoughtful replies at 5-9 words per second, feeling the power of huge AI right on your machine.
Enjoy lightning-fast, private conversations with world-class intelligence, all without expensive servers or waiting for the cloud.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.