jundot

jundot / omlx

Public

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

97
15
100% credibility
Found Feb 17, 2026 at 45 stars 2x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

oMLX is a menu bar application for running large language models locally on Apple Silicon Macs, featuring an admin dashboard for model downloads, multi-model serving, and OpenAI/Anthropic API compatibility.

How It Works

1
🖥️ Discover oMLX

You hear about a simple app that lets your Mac run smart AI helpers right at home, without needing the internet.

2
📥 Download the app

Grab the ready-to-use app from the official releases page and drag it to your Applications folder.

3
🚀 Launch and set up

Open the app from your menu bar, pick a folder for your AI helpers, and download your first one with a few clicks.

4
💬 Start chatting

Jump into the built-in chat window to ask questions and get instant smart replies from your local AI.

5
🔗 Connect your tools

Link it to your favorite writing apps or coding helpers so they can use your speedy local brain.

🎉 Enjoy fast local AI

Watch your Mac handle big conversations smoothly from the menu bar, with everything private and blazing fast.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 45 to 97 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is omlx?

oMLX runs LLM inference on Apple Silicon Macs with continuous batching for concurrent requests and SSD caching to persist KV cache across restarts, letting you reuse long contexts without recomputing. Launch via macOS menu bar app or CLI (`omlx serve --model-dir ~/models`), get OpenAI/Anthropic-compatible APIs at localhost:8000/v1, plus admin dashboard for downloading models from Hugging Face, chatting, and monitoring. Built in Python atop MLX, it handles LLMs, embeddings, and rerankers in one server.

Why is it gaining traction?

Stands out in llm github projects with dead-simple Mac menu bar control—no terminal babysitting—plus paged SSD caching that slashes llm inference time on repeated prompts, beating basic mlx-lm setups in llm inference speed benchmarks. Multi-model LRU swapping and pinning keep daily drivers loaded while auto-evicting others, with Claude Code tweaks for local tool use. Developers dig the drop-in llm inference api for llm github integration and llm github copilot alternatives.

Who should use this?

Apple Silicon devs building local llm github local apps or testing llm github download models without cloud costs. AI tinkerers needing llm-inference server for embeddings/reranking in RAG pipelines. Mac power users eyeing llm inference hardware like M-series chips for fast llm inference on cpu fallbacks or llm inference speed in coding workflows.

Verdict

Promising alpha for Mac-local LLMs (45 stars, solid README), but 1.0% credibility score flags early risks—test thoroughly before prod. Grab if you're on Apple Silicon chasing omlx 2025 vibes in llm github repository simonw-style simplicity.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.