t8

t8 / hypura

Public

Run models too big for your Mac's memory

432
8
100% credibility
Found Mar 25, 2026 at 432 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

Hypura is a tool that lets users run oversized AI language models on memory-limited Apple Silicon Macs by smartly distributing model data across GPU, RAM, and SSD storage.

How It Works

1
🔍 Discover Hypura

You hear about a clever way to chat with huge AI brains on your Mac, even if they're bigger than your computer's memory.

2
📥 Get Hypura

Download and set up the free tool that makes your Mac super-smart for AI talks.

3
Check your Mac

Run a one-time scan to see your Mac's strengths, like speedy graphics and storage.

4
🧠 Pick an AI model

Choose a big AI conversation partner file from your downloads.

5
Start chatting
🗣️
Direct chat

Type messages and watch responses appear live.

🌐
Web service

Turn it into a private web helper for apps and browsers.

🎉 AI unlocked

Enjoy smooth talks with massive AIs that used to crash your Mac, now running steadily thanks to smart memory tricks.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 432 to 432 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is hypura?

Hypura runs LLMs too big for your Mac's unified memory by tiering tensors across GPU, RAM, and NVMe based on access patterns and hardware bandwidth. Built in Rust atop llama.cpp with Metal acceleration, it auto-profiles your Apple Silicon setup and picks modes like expert-streaming for MoE models or dense FFN-streaming for giants like 40GB Llama 70B. Use `hypura run model.gguf --interactive` for chat or `hypura serve` for an Ollama-compatible API at localhost:8080.

Why is it gaining traction?

It turns OOM crashes into runnable inference—2.2 tok/s on 31GB Mixtral or 0.3 tok/s on Llama 70B where llama.cpp fails on 32GB M1 Max—while matching full speed on smaller models with zero overhead. Benchmarks, hardware profiling (`hypura profile`), and `hypura bench` comparisons hook devs tired of swap thrash. Drop-in Ollama endpoints let you run github actions locally or swap in for tools like OpenClaw.

Who should use this?

Apple Silicon owners squeezing 30B+ GGUF models locally without upgrading RAM, especially MoE enthusiasts hitting memory walls on Mixtral or Qwen. Devs prototyping LLM apps via Ollama API, or researchers like those modeling hypural bone fish migration with oversized finetunes. Skip if you have 64GB+ or prefer cloud.

Verdict

Grab it if big local LLMs crash your Mac—benchmarks and CLI shine—but 432 stars and 1.0% credibility score signal early days; docs are solid, but test thoroughly before production. (198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.