xigh / herbert-rs
PublicLocal LLM inference engine written from scratch in Rust — hand-written AVX-512 assembly kernels, Metal & Vulkan compute shaders. Supports Qwen3, Mistral3, ... Q4/INT8/BF16 quantization.
Herbert-rs is a high-performance engine for running large language models locally on CPU or GPU, with chat CLI, web server API, and desktop app supporting text and vision inputs.
How It Works
You hear about Herbert, a fast way to run smart AI chatbots right on your own computer without needing the internet.
Download ready-to-use AI models from a trusted library using a simple helper tool—it saves them safely on your machine.
Jump into instant conversations by typing messages and getting smart replies.
Open a beautiful chat window with image uploads and conversation history.
Set up a private server so your apps or browsers can talk to the AI.
Type a question, add a photo if you want, and watch the AI think and respond lightning-fast on your hardware.
Pick faster settings like using your graphics card or squeezing the model smaller for even quicker answers.
Enjoy unlimited, private chats with powerful AI that runs entirely on your computer—fast, secure, and always ready.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.