antirez

antirez / ds4

Public

DeepSeek 4 Flash local inference engine for Metal

859
46
100% credibility
Found May 08, 2026 at 849 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
C
AI Summary

A native program for running the DeepSeek V4 Flash AI model locally on Apple Silicon Macs with optimized Metal acceleration, offering a chat interface and server for agent tools.

How It Works

1
💡 Discover ds4

You hear about ds4, a way to run a super-smart AI called DeepSeek V4 Flash right on your Mac, fast and private.

2
📥 Grab the AI model

Run a simple download script to get the special AI files tailored for your Mac's power.

3
🔨 Prepare the program

Use one easy command to build the chatting tool from the ready-made instructions.

4
🗣️ Start chatting

Type your first question in the interactive chat and watch the AI think and respond lightning-fast.

5
Choose your way
💬
Keep chatting

Continue asking questions in the fun, back-and-forth conversation mode.

🌐
Share with apps

Start a local service so your favorite coding helpers or tools can use the AI too.

🚀 AI magic on your Mac

Enjoy a frontier-level AI that thinks deeply, handles huge contexts, and runs smoothly on your machine without sending data anywhere.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 849 to 859 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ds4?

ds4 runs DeepSeek V4 Flash locally on Apple Silicon Macs using Metal acceleration, delivering a CLI chat interface and OpenAI/Anthropic-compatible HTTP server for seamless integration. Download specialized GGUF models via a script—q2 for 128GB RAM machines or q4 for 256GB+—and get 1M token context with disk-persisted KV cache to handle long sessions without reloading prompts. Built in C, it focuses solely on this model, skipping generic runner overhead for direct, credentialed inference.

Why is it gaining traction?

Unlike broad tools like llama.cpp, ds4 bets narrow on DeepSeek V4 Flash, hitting 26-36 t/s generation on M3 Max/Ultra while enabling thinking modes with complexity-proportional reasoning—far shorter than rivals like pro versions. Disk KV persistence and agent-ready endpoints (tools, streaming) make it drop-in for coding workflows, dodging deepseek flash crash issues in generic setups. Benchmarks show it crushes v4 flash size constraints on consumer hardware, drawing devs from deepseek github reddit threads seeking ds4 auto speedups.

Who should use this?

High-end Mac owners (128GB+ RAM) building local coding agents with opencode, Pi, or Claude Code, where OpenAI-style chat completions need 100k+ context without cloud costs. Italian/English writers or researchers testing deepseek v4 flash gguf on long prompts, or anyone integrating deepseek github models via ds4-server for tool calls and speculative decoding. Skip if you're on non-Metal hardware or need multi-model support.

Verdict

Grab it if you're on qualifying Apple hardware and want polished DeepSeek V4 Flash inference today—CLI like `./ds4` and server with `--kv-disk-dir` just work. At 571 stars and 1.0% credibility, it's alpha with solid tests but expect tweaks; deepseek flashmla github watchers should monitor for v4 updates.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.