SharpAI / SwiftLM

Public

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.

apple-sili inference ios llm metal

100% credibility

Found Apr 01, 2026 at 26 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

C++

AI Summary

SwiftLM is a native Swift server for running large language models on Apple Silicon Macs with an OpenAI-compatible API and an optional iOS chat app.

How It Works

💡 Discover SwiftLM

You hear about a super-fast way to run powerful AI chatbots right on your Mac without needing the internet.

📥 Grab the ready app

Download the simple app file from the releases page and unzip it on your Mac.

🚀 Pick a brainy model

Open your terminal, type the app name with a model like a smart assistant, and it grabs everything needed.

💬 Chat with your AI

Ask questions using a web tool or app, and get instant smart replies powered by your Mac's chip.

Try on your phone too

💻

Stick to Mac

Keep enjoying fast chats on your computer.

📱

Go mobile

Run the same AI on your iPhone or iPad.

🎉 Your private AI is live

You now have a speedy, private thinking assistant that works offline on your Apple devices.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 26 to 26 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is SwiftLM?

SwiftLM runs large language models natively on Apple Silicon using MLX, delivering an OpenAI-compatible API server for chat completions and streaming. It loads HuggingFace models directly, supports SSD streaming for 100B+ MoE setups, and includes TurboQuant KV cache compression to cut memory use by 3.5x. Developers get a single binary CLI server plus an iOS/iPhone app for on-device inference, all without Python overhead.

Why is it gaining traction?

This github native alpha project stands out by hitting bare-metal speeds on Apple hardware—no GIL, no runtime copies—while mimicking OpenAI endpoints like /v1/chat/completions for easy client swaps. Features like expert streaming prevent memory crashes on massive models, and cache compression enables smoother inference. It's a react native mlx-style win for Apple-focused devs chasing low-latency local AI.

Who should use this?

Apple Silicon Mac users building local LLM APIs, iOS devs embedding models in iphone apps, or teams running 100B+ MoE inference with tight memory constraints. Ideal for backend engineers needing OpenAI drop-ins without cloud costs, or mobile devs prototyping on-device chat UIs.

Verdict

Promising for Apple MLX workflows, but at 26 stars and 1.0% credibility—github native automerge not enabled, sparse tests—treat as alpha. Grab pre-built binaries to test; contribute if you're in C++/Swift AI serving.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 26 stars

Bonus: AI verified quality (100%)

Account age: 2,970 days

Repo age: 11 days

License: MIT

Updated: Apr 01, 2026