Entrpi

Entrpi / ds4-on-spark

Public

antirez/ds4 (DwarfStar 4) on NVIDIA DGX Spark β€” install, benchmarks, and roofline analysis. Steady-state decode at ~95% of bandwidth ceiling; MTP and concurrency analyzed.

14
0
69% credibility
Found May 19, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Shell
AI Summary

This repository provides a complete setup guide for running the DeepSeek-V4 AI model on NVIDIA DGX Spark hardware. It automates the process of downloading the inference engine, building optimized binaries for the specific GPU, downloading the 81-gigabyte quantized model, and starting a server that can answer questions like an AI assistant. The project includes detailed performance benchmarks showing the AI can generate about 24-28 tokens per second during steady use, reaching approximately 95% of the hardware's theoretical speed limit. It also documents a known issue with speculative decoding on CUDA that causes a small performance regression, along with the root cause and planned fix.

How It Works

1
πŸ” You discover ds4-on-spark

You learn about a project that lets you run a powerful AI model called DeepSeek-V4 on your NVIDIA DGX Spark computer, with detailed performance measurements.

2
⚑ One command sets up everything

You run a single installer command that automatically checks your hardware, downloads the AI engine, and gets everything ready to use.

3
πŸ”§ Your system verifies it's ready

The installer confirms your Spark computer has the right GPU and memory to run the AI model smoothly.

4
πŸ“¦ The AI brain downloads and builds

The 81-gigabyte AI model downloads piece by piece, and the inference engine compiles specifically for your hardware.

5
βœ… A quick test confirms it works

The installer asks the AI a simple question like 'What is the capital of France?' and verifies it answers correctly.

6
Choose how to use your AI
πŸ’¬
Start the chat server

Launch your AI assistant that answers questions through a web interface, ready whenever you need it.

πŸ“Š
Run performance tests

Measure how fast your Spark can process AI requests and compare against the theoretical limits.

πŸŽ‰ Your AI assistant is ready

Your DeepSeek-V4 AI is now running on your Spark, able to answer questions, write code, and help with complex reasoning tasks at impressive speeds.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 14 to 14 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ds4-on-spark?

This is a deployment toolkit for running DeepSeek-V4-Flash on NVIDIA DGX Spark hardware. It wraps antirez/ds4 with shell scripts that handle installation, benchmarking, and performance analysis on GB10 GPUs. A single curl command clones the engine, builds the CUDA binaries, downloads the quantized model weights, and runs a smoke test. The project includes detailed throughput measurements across different context lengths and a roofline analysis showing how close the inference engine runs to the hardware bandwidth ceiling.

Why is it gaining traction?

The hook is the roofline analysis. Most inference benchmarks report "tokens per second" without explaining why the numbers are what they are. This project measures actual memory bandwidth on the Spark, breaks down bytes-per-token by quantization bucket, and shows that steady-state decode hits roughly 95% of the bandwidth roofline. That kind of analysis is rare and useful for anyone trying to understand whether further optimization is even possible. The one-command install also lowers the barrier for reproducing the numbers.

Who should use this?

ML engineers evaluating DeepSeek-V4-Flash on Blackwell hardware will find the benchmarks directly applicable. Researchers studying MoE inference efficiency will value the roofline decomposition. Anyone running ds4 on a DGX Spark and wanting reproducible performance numbers should start here. If you need concurrent request handling, look elsewhereβ€”the server serializes clients by design.

Verdict

The technical content is solid and the analysis is genuinely useful, but the repository has 14 stars and a credibility score of 0.699999988079071%, signaling early-stage work. Use it as a reference for benchmarking methodology and hardware-specific performance expectations, but treat the scripts as a starting point rather than production-ready infrastructure.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.