christopherkarani

Direct Apple Neural Engine inference in Swift. 4.7x faster decode than Core ML via reverse-engineered private APIs. Compile once, dispatch forever.

26
0
100% credibility
Found Mar 13, 2026 at 26 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Swift
AI Summary

Espresso accelerates transformer model inference and training on Apple Neural Engine hardware for much faster performance than standard tools.

How It Works

1
🔍 Discover Espresso

You hear about a clever way to make AI models run much faster on your Mac without needing special apps.

2
💻 Get it ready

Download the files and prepare it with easy steps on your Apple computer.

3
🧠 Add your model

Load the brain of your AI, like word patterns from a file.

4
Test the speed

Run a quick check and see it generate text 5 times faster than usual – wow!

5
✍️ Start creating

Feed it words and watch it reply instantly with smart text.

🚀 Supercharged AI

Now your AI thinks and responds blazing fast, perfect for fun projects!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 26 to 26 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Espresso?

Espresso delivers direct Apple Neural Engine inference in Swift, skipping Core ML for transformer models on Apple Silicon. It compiles MIL programs once for backprop and exact token generation, hitting 4.7x faster decode speeds via private APIs and IOSurface buffers. Run benchmarks yourself with simple scripts to verify ms/token gains over CoreML baselines.

Why is it gaining traction?

Bypasses Core ML overhead for recurrent decode that outputs verified tokens without recompiles, using ANE-native KV cache management. Decent Espresso GitHub traction comes from reproducible 1.08 ms/token exact generation on local artifacts, outpacing cpuAndNeuralEngine paths. Direct Apple support for raw perf draws devs tweaking espresso macchiato-fast autoregression.

Who should use this?

ML researchers on M-series Macs fine-tuning RWKV-style or Llama models outside App Store. Suited for enterprise tools, sideloaded generation apps, or direct Apple Watch inference prototypes where private APIs unlock peak ANE throughput. Skip if targeting direct Apple Store in India or App Review.

Verdict

Worth testing for ANE-maxed inference on macOS 15+—speedups hold up. At 26 stars and 1.0% credibility, it's raw research code; pair with its tests and scripts before production.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.