videlalvaro / emilio

Public

Transformer inference via unconventional compute: a single algebraic primitive, a tag-system scheduler, and Apple's Neural Engine through convolution.

apple-silicon inference llm machine-learning metal

100% credibility

Found Apr 19, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

Emilio is a local inference engine for running small large language models like Qwen2.5 using a novel EML arithmetic primitive that expresses the entire transformer forward pass through exp(ln(a) + ln(b)) multiplications.

How It Works

🔍 Discover emilio

You hear about emilio, a simple way to chat with a smart AI helper right on your Mac without needing the internet.

📥 Get the AI brain

Download a small file called a model that contains everything the AI knows.

🔨 Set it up once

Run one easy command to prepare emilio on your computer.

⚡ Unlock your Mac's power

Turn on your Mac's built-in speed boost to make chats lightning fast.

💬 Start chatting

Type a question like 'What is 2+2?' and watch the AI reply.

🎉 Enjoy private smart chats

Get helpful answers anytime, all running safely on your own Mac.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is emilio?

Emilio runs full transformer inference for LLMs like Qwen2.5-0.5B using a single algebraic primitive—every multiplication becomes exp(ln(a) + ln(b))—handling all arithmetic from matmuls to activations. Load GGUF models via Rust CLI for chat or generate modes at 5.5 tok/s on CPU or 30 tok/s on Apple Silicon GPU with Metal shaders; compile to .eml for faster runtime. Python scripts verify correctness against references, making it a github transformer example for unconventional transformer inference architecture.

Why is it gaining traction?

It proves one primitive powers production-grade transformer forward passes, standing out from llama.cpp or Hugging Face pipelines by prioritizing algebraic purity over raw speed. Devs dig the pure-EML Metal kernels, auto-optimization loops, and benchmarks showing Python POC to GPU scaling. As a transformer github repo and tutorial, it hooks experimenters in transformer inference arithmetic and github pytorch transformer hacks.

Who should use this?

AI researchers prototyping transformer github code or time series models on Apple hardware. Apple Silicon devs exploring Neural Engine via convolution for transformer inference engine alternatives. ML engineers seeking github transformer repo for edge inference without standard matmul stacks.

Verdict

Fascinating POC for minimalism in transformer inference—try it for the theory, but with 11 stars and 1.0% credibility, it's raw; lacks prod polish, so use alongside llama.cpp. Solid docs and verification make it worth forking for experiments.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

991

Followers

Base stars: 11 stars

Bonus: AI verified quality (100%)

Account age: 6,385 days

Repo age: 4 days

License: MIT

Updated: Apr 18, 2026