john-rocky

Run LLMs on Apple devices with CoreML, optimized for Apple Neural Engine + GPU

47
2
100% credibility
Found Apr 10, 2026 at 47 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A Swift library and iOS app for running optimized large language models like Gemma 4 on Apple devices using CoreML and the Neural Engine.

How It Works

1
๐Ÿ“ฑ Discover on-device AI chat

You hear about an app that runs smart conversations directly on your iPhone, no internet needed.

2
๐Ÿ”ฝ Pick and get your AI brain

Choose a conversation model like Gemma and download it once to your phone.

3
๐Ÿ’ฌ Start chatting

Type a question or add a photo, and watch your private AI think and reply in real time.

4
๐Ÿ”„ Keep the talk going

Ask follow-ups, share images for descriptions, and see responses stream live.

๐ŸŽ‰ Your pocket AI companion

Enjoy fast, battery-friendly chats that stay private on your device forever.

Sign up to see the full architecture

3 more

Sign Up Free

Star Growth

See how this repo grew from 47 to 47 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is CoreML-LLM?

CoreML-LLM lets you run LLMs like Gemma 4 on iPhones and Macs using CoreML, optimized for the Apple Neural Engine to keep things battery-friendly and leave the GPU free for other tasks. It includes a Python CLI for converting Hugging Face coreml llm models to CoreML format, pre-converted coreml llm ios models for quick starts, and a Swift package for easy integration into iOS apps. Developers get on-device inference at 31 tokens/second decode on recent iPhones, with support for text and multimodal inputs.

Why is it gaining traction?

Unlike GPU-heavy alternatives like MLX Swift, it prioritizes ANE for always-on, power-efficient run llms locally without draining battery or competing for GPU resources. The iOS chat app demo downloads models automatically, streams responses, and handles images seamlessly, making coreml llm models feel native. Plus, the conversion CLI simplifies deploying custom models, appealing to those tired of server dependencies for run llms on ios.

Who should use this?

iOS developers building offline chat apps or AI companions that need to run llms locally on device, especially with multimodal features like image description. It's ideal for indie app makers targeting Apple hardware who want fast prefill (154 tokens/second) without cloud latency or for teams optimizing coreml llm ios for production. Avoid if you're on non-Apple platforms needing run llms on android or run llms on cpu alternatives.

Verdict

Promising for Apple-focused devs chasing efficient on-device AI, but with only 47 stars and 1.0% credibility score, it's early-stageโ€”docs are solid but expect some tinkering for custom models. Try the sample app first; pair with the CLI for your own coreml llm models if ANE perf hooks you.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.