Mininglamp-AI / cider

Public

W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.

apple-silicon metal mlx quantization w4a8

100% credibility

Found May 04, 2026 at 26 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

Cider speeds up picture-understanding AI models on Apple Macs by adding fast math tricks for better performance.

How It Works

🔍 Discover Cider

You hear about Cider, a tool that makes AI models chat about pictures much faster on your Mac.

📥 Get it set up

Download and install it easily so your Mac can use the speed features.

📁 Pick your AI model

Choose a picture-understanding AI model you already have on your computer.

⚡ Unlock the speed

Run one simple command to upgrade your model for lightning-fast thinking.

🖼️ Chat with pictures

Ask questions about images and get answers quicker than before.

🌐 Share it online

Start a web service to let apps or friends use your speedy AI.

🚀 Blazing fast AI

Enjoy responses that arrive in a flash, saving time on every question.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 26 to 27 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is cider?

Cider supercharges MLX inference on Apple Silicon M5+ chips by enabling W8A8 and W4A8 quantization modes missing from stock MLX, delivering 1.2-1.9x faster LLM prefill via custom Metal primitives. Built in Python atop MLX, it offers one-line model conversion—`convert_model(model)` swaps Linear layers for accelerated ones that auto-switch between batched GEMM for prompts and MV for decode. It also patches mlx_vlm for multi-image Qwen3-VL, plus a ready FastAPI server mimicking OpenAI chat completions with image support.

Why is it gaining traction?

Unlike MLX's weight-only quant, Cider fuses activation quantization with M5's unused INT8 TensorOps for real end-to-end speedups on VLMs like Qwen3-VL-2B (up to 32% faster prefill at near-identical perplexity). Devs love the drop-in API—no mode toggling needed—and conditional build that gracefully falls back on M4. The apple-cider github repo's benchmarks and VLM service make it a quick win for local inference without rewriting generation loops.

Who should use this?

MLX users on M5 Macs running local LLMs/VLMs (Qwen3, Llama3) who hit prefill bottlenecks in agentic workflows or image chat apps. Ideal for Apple devs building cider app github prototypes, cider shop erfahrungen demos, or cider ios emulator github tools needing fast vision-language serving. Skip if you're on M4 or prefer full frameworks like vLLM.

Verdict

Grab it if you have M5 hardware—solid alpha for targeted acceleration, with clear docs and PPL benches. At 25 stars and 1.0% credibility, it's early (watch for KV quant roadmap), but built for Apple silicon makes it worth the pip install for prefill hogs.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 27 stars

Bonus: AI verified quality (100%)

Account age: 40 days

Repo age: 6 days

License: MIT

Updated: May 04, 2026