xigh

xigh / herbert-rs

Public

Local LLM inference engine written from scratch in Rust — hand-written AVX-512 assembly kernels, Metal & Vulkan compute shaders. Supports Qwen3, Mistral3, ... Q4/INT8/BF16 quantization.

13
1
100% credibility
Found Mar 20, 2026 at 13 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

Herbert-rs is a high-performance engine for running large language models locally on CPU or GPU, with chat CLI, web server API, and desktop app supporting text and vision inputs.

How It Works

1
🔍 Discover Herbert

You hear about Herbert, a fast way to run smart AI chatbots right on your own computer without needing the internet.

2
📥 Grab AI Brains

Download ready-to-use AI models from a trusted library using a simple helper tool—it saves them safely on your machine.

3
Pick Your Style
💬
Command Line Chat

Jump into instant conversations by typing messages and getting smart replies.

🖥️
Desktop App

Open a beautiful chat window with image uploads and conversation history.

🔌
Web Server

Set up a private server so your apps or browsers can talk to the AI.

4
🧠 Start Talking

Type a question, add a photo if you want, and watch the AI think and respond lightning-fast on your hardware.

5
⚙️ Tweak for Speed

Pick faster settings like using your graphics card or squeezing the model smaller for even quicker answers.

Your Private AI Buddy

Enjoy unlimited, private chats with powerful AI that runs entirely on your computer—fast, secure, and always ready.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 13 to 13 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is herbert-rs?

Herbert-rs is a from-scratch Rust engine for local LLM inference, running Qwen3 and Mistral3 models on CPU (AVX-512) or GPU (Metal, Vulkan). It powers interactive chat via herbert-cli, an Anthropic-compatible HTTP server (herbert-server), and a Tauri desktop app, with Q4/INT8/BF16 quantization, vision-language support, and MoE handling. Download HuggingFace models and chat instantly, like a local github copilot alternative for herbert rs tasks.

Why is it gaining traction?

It skips GGML/llama.cpp deps for pure Rust with hand-optimized kernels, delivering decode speeds on par or better than llama.cpp on long contexts via INT8 KV cache – key for interactive use. Benchmarks on Ryzen show 2x gains at 10k+ tokens, plus easy GPU switching and streaming APIs. Devs dig the no-Python local github runner vibe for herbert rs3 experiments.

Who should use this?

Rust devs deploying local LLMs in apps or servers, AI researchers benchmarking inference (dense/MoE/VL), or teams needing a local github copilot alternative for code gen without cloud latency. Ideal for macOS/Linux setups testing Qwen3-VL images or Mistral chats.

Verdict

Early gem (13 stars, 1.0% credibility score) with killer benchmarks and polished CLI/server/desktop – try for fast local inference, but expect tweaks as it matures beyond core models.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.