mlx-flash is a Python library that enables running large language models exceeding available RAM on Apple Silicon Macs by streaming weights from disk using the system's page cache.
How It Works
You want to chat with huge AI models on your Mac, but they won't fit in memory—then you find this tool that streams them smoothly from your drive.
Download and add the helper with a quick library install, like adding an app to your toolkit.
Simply tell the tool to use its special low-memory trick so big models load without crashing.
Pick any massive AI brain bigger than your RAM—it loads super fast and uses hardly any memory!
Type your questions and watch smooth, smart replies pour in, just like with smaller models.
Your everyday Mac now handles enormous AIs effortlessly, saving you from buying fancy hardware.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.