zolotukhin

zolotukhin / zinc

Public

Zig INferenCe Engine — LLM inference for AMD RDNA3/RDNA4 GPUs via Vulkan

37
2
100% credibility
Found Mar 29, 2026 at 37 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Zig
AI Summary

ZINC is a high-performance inference engine that enables running large language models on consumer AMD GPUs using Vulkan, without proprietary drivers like ROCm.

How It Works

1
🔍 Discover ZINC

You hear about a simple way to run powerful AI chatbots right on your home computer using your AMD graphics card, without needing expensive data center hardware.

2
📥 Get the essentials

Download a couple of free tools like a programming language and graphics drivers that make your GPU ready for AI work.

3
🔨 Build it once

Run one easy command to prepare the program, and it automatically sets up everything for your specific graphics card.

4
🚀 Test your first chat

Point it to a free AI model file and type a question — watch as it thinks and replies just like a smart assistant.

5
See the magic happen

Your screen fills with a thoughtful, correct answer streaming in real-time, proving your home PC can handle big AI brains.

6
🌐 Start a chat server

Launch it as a web service with another quick command, opening a browser chat right on your computer.

🎉 Your personal AI is ready

Enjoy chatting, asking questions, or connecting your favorite apps to your fast, private AI running locally — no cloud needed.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 37 to 37 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is zinc?

Zinc is a Zig-based inference engine that runs LLMs on AMD RDNA3/RDNA4 GPUs like the RX 9070 using Vulkan, sidestepping ROCm and CUDA entirely. It loads GGUF models (Qwen MoE, Llama, Mistral) and serves them via an OpenAI-compatible API at /v1/chat/completions, with streaming SSE, a built-in chat UI at localhost:8080, and CLI mode for quick tests. AMD consumer cards with 16-32GB VRAM finally handle production inference at scale, hitting 576GB/s bandwidth without datacenter hardware.

Why is it gaining traction?

In the zig github lang ecosystem, zinc stands out for zig ai inference and zig llm inference, delivering coherent output on 35B MoE models where llama.cpp Vulkan lags on RDNA4. Continuous batching shares the GPU across requests without slowdowns, TurboQuant compresses KV cache 5x, and it's drop-in for OpenAI SDKs—no client changes needed. Developers dig the self-optimizing loop that iteratively boosts tok/s via AI agents, tracked in zig github issues and milestones.

Who should use this?

AMD GPU owners (RX 7000/9000 series) building local LLM servers for chat apps or APIs, especially those serving 4+ concurrent users on a $550 card. Zig enthusiasts exploring zig github projects for low-level GPU work, or teams ditching ROCm headaches for Vulkan portability.

Verdict

Try it if you have RDNA4 hardware and want zig llm inference without vendor lock-in—CLI builds fast, API works out-of-box. At 37 stars and 1.0% credibility, it's pre-1.0 (rough edges in docs, tests cover basics), but the zig repo github trajectory and active zig github topics make it worth starring for zig ai inference experiments.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.