jdaln

Serve the home! Inference stack for your Nvidia DGX Spark aka the Grace Blackwell AI supercomputer on your desk. Mostly vLLM based for now

22
3
100% credibility
Found Feb 06, 2026 at 13 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
JavaScript
AI Summary

A ready-to-use home server kit for Nvidia DGX Spark that runs many large AI language models on demand with smart power saving and easy app connections.

How It Works

1
🏠 Discover Home AI Server

You find a friendly guide to turn your powerful Nvidia DGX Spark computer into a smart home helper for chatting with AI anytime.

2
📁 Prepare Your Space

Create a few folders on your computer and download special word lists needed for the AI to understand language.

3
🔑 Sign Up for Helpers

Make a free account at a helpful service and prepare your computer tools so everything connects smoothly.

4
🧠 Build Your AI Engines

Prepare custom smart engines tailored for your computer, taking about 20 minutes each to ensure top speed and smarts.

5
▶️ Launch with One Click

Start your home AI server – it loads brains only when needed and saves power by resting when idle.

6
💬 Chat with Your First AI

Send a simple hello message and get a friendly reply, proving your setup works perfectly.

🎉 AI Ready at Home

Now enjoy powerful conversations, coding help, or image analysis anytime from apps like your code editor or terminal, all from your own machine.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 13 to 22 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is dgx-spark-inference-stack?

This Docker Compose stack turns your Nvidia DGX Spark—aka the Grace Blackwell AI supercomputer on your desk—into a ready-to-serve home inference server. It spins up vLLM endpoints for 29+ optimized models (FP4/FP8 quantized Qwen, Llama, GLM, GPT-OSS), with on-demand loading via a unified OpenAI-compatible API at localhost:8009. Models auto-shutdown after idle time, saving power on your single-GPU beast, and it supports vision inputs, tool calling, and reasoning chains out of the box.

Why is it gaining traction?

Unlike generic PyTorch Serve GitHub setups or TRT-LLM serve GitHub hacks, this nails Blackwell hardware quirks with custom vLLM images for max throughput (65+ tps on Nemotron). The waker service handles GPU scheduling intelligently—no more manual container juggling—and integrations plug straight into VS Code (Cline) or terminal agents (OpenCode). Multi-language docs make it accessible beyond English homelabs.

Who should use this?

DGX Spark owners tired of side-project dust collectors. AI devs building local agents with vision/tool support, or homelab tinkerers serving static files via API for custom UIs. Skip if you're not on Blackwell; it's hyper-optimized for that GB10 GPU.

Verdict

Grab it if you own the hardware—quickstart gets you serving in 30 minutes, docs are solid despite hobby roots. At 17 stars and 1.0% credibility, expect tweaks for edge cases like experimental model crashes, but community PRs are flowing. Solid 8/10 for niche power users. (198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.