ohdearquant

ohdearquant / lattice

Public

pure rust inference engine

12
2
89% credibility
Found May 17, 2026 at 12 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

Lattice is a pure Rust inference engine for transformer models designed to run on Apple Silicon using Metal shaders and SIMD optimization, with claimed performance advantages over Ollama and MLX for certain model architectures.

Star Growth

See how this repo grew from 12 to 12 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is lattice?

Lattice is a pure Rust inference engine for running transformer models on Apple Silicon. It loads models from HuggingFace, handles tokenization, runs the forward pass, and computes vector operations—all without ONNX, Python, or CUDA. The project targets two workloads: embedding generation (BGE, E5, MiniLM families) and decoder-only inference (Qwen3). On Apple Silicon, it uses Metal shaders for GPU acceleration; on other platforms, it falls back to WGPU or CPU with hand-written SIMD kernels (AVX2 on x86, NEON on ARM).

Why is it gaining traction?

The hook is simple: Rust developers can now add embedding generation or LLM inference to their projects without dragging in a Python runtime or a 300MB ONNX dependency. Benchmarks show SIMD vector operations hitting 90ns for cosine similarity on 384-dimensional vectors—roughly 23x faster than scalar code. For Qwen3.5 specifically, Lattice claims to be the only engine correctly running the hybrid GatedDeltaNet architecture with QuaRot 4-bit quantization and LoRA hot-swap, features that neither Ollama nor Apple's MLX support on Apple Silicon.

Who should use this?

Rust developers building AI-powered applications who want to keep their stack pure Rust. Teams deploying to Apple laptops or edge devices where NVIDIA GPUs are not available. Projects that need embedding caching, batch processing, or the ability to swap LoRA adapters at runtime without reloading the base model. If you need NVIDIA GPU inference today, look elsewhere—this project is explicitly not targeting CUDA.

Verdict

Lattice is a well-structured, thoughtfully documented project with comprehensive benchmarks and a modular workspace design. The credibility score of 0.8999% and star count of 12 reflect its early stage and narrow audience—it's a niche tool for Rust shops on Apple Silicon, not a general-purpose ML engine. If that is your stack, the feature set (pure Rust, Metal backend, LoRA injection, optimal transport utilities) justifies evaluation. If you need broad model support or NVIDIA acceleration today, wait for the project to mature or use an established alternative.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.