johnmai-dev

LLM inference on Apple Neural Engine (ANE)

64
4
69% credibility
Found Mar 04, 2026 at 62 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
C++
AI Summary

A command-line tool for running Qwen3.5 language models efficiently on Apple Silicon Macs using the device's neural hardware.

How It Works

1
🔍 Discover Fast AI on Your Mac

You hear about a simple way to chat with smart AI directly on your Apple Mac without needing the internet or cloud services.

2
📥 Download the Program and AI Model

Grab the free program files and a small AI model file like Qwen3.5 that fits on your Mac.

3
⚙️ Set It Up on Your Mac

Follow the easy preparation steps to get everything ready, including optionally speeding up your model for quicker loads.

4
🚀 Launch and Connect Your Model

Start the program, point it to your AI model, and watch it come alive using your Mac's built-in power.

5
Choose Your Way to Chat
Quick Generation

Type a single question and get an instant smart reply.

🗣️
Interactive Chat

Have a back-and-forth conversation like talking to a friend.

🎉 Enjoy Private, Speedy AI

Experience lightning-fast responses from your personal AI, all running smoothly and securely on your own Mac.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 62 to 64 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ANE-LM?

ANE-LM runs LLM inference directly on Apple Neural Engine using C++ and private framework APIs, targeting Qwen3.5 dense text models on Apple Silicon. It accelerates local llm github local runs, delivering high llm inference speed without GPU reliance—ideal for beating llm inference on cpu limits. The CLI offers generate for single prompts, chat for interactive sessions, and convert for BF16-to-FP16 model prep.

Why is it gaining traction?

It crushes llm inference benchmark results on ANE hardware, outpacing CPU or Metal for llm inference time in compact models, with persistent compile caching for repeat loads. Stands out in github llm-resources and llm github projects as a lightweight llm-inference server alternative for on-device use. Devs hook on the raw tps gains for llm github repo experiments, no cloud needed.

Who should use this?

Apple Silicon devs embedding llm github integration in macOS/iOS apps, researchers tweaking llm inference hardware calculator for ANE vs CPU, or tinkerers in llm github course projects chasing local llm inference api perf. Suits offline chatbots or llm github copilot-style tools where battery life matters.

Verdict

Worth testing for ANE-maxed llm inference service on M-chips—CLI simplicity shines despite 42 stars signaling early days. 0.7% credibility score flags private API risks (future macOS breakage possible), but clean README and benchmarks make it a pragmatic pick for llm github simonw fans eyeing local speedups.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.