r0b0tlab

GB10 NVFP4 native MTP reproducibility pack for Qwen3.6-35B-A3B

10
0
85% credibility
Found May 24, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This repository is a step-by-step guide for running a large AI assistant (Qwen3.6-35B) on NVIDIA GB10 graphics cards. The project includes a compatibility fix for the inference software, launch scripts, and testing tools. It uses special compression techniques to make the AI assistant run efficiently on high-end hardware while maintaining quality responses. The project does not include the AI model files itself—users download those separately—and provides clear documentation so anyone with the right hardware can reproduce the results.

How It Works

1
🔍 You hear about this project

You discover a community member shared a working setup for running a powerful AI assistant on new high-end hardware.

2
📥 You download the AI model

Following the instructions, you download the AI assistant files separately from the official model library.

3
🔧 You apply the compatibility fix

The project includes a small fix that makes everything work properly with your hardware and software setup.

4
🚀 You launch your AI assistant

With one simple command, your AI assistant starts up on your computer, ready to answer questions.

5
You verify everything works

You run a quick test to make sure the AI assistant is thinking correctly and responding properly.

6
You check the performance
Single question test

Ask one question and see how quickly it responds

🌐
Multiple questions test

Send several questions at once to see how it handles busy periods

🎉 Your AI assistant is ready

Everything is working! Your AI assistant runs fast on your hardware, and you can start using it for real tasks.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is qwen36-35b-a3b-nvfp4-gb10-native-mtp?

This is a reproducibility pack for running Qwen3.6-35B-A3B with NVFP4 quantization on NVIDIA GB10 hardware using SGLang. The project solves a specific loader bug where SGLang's stock image produced degenerate outputs (repeated "!" tokens) because it mishandled hybrid linear-attention projections stored as unquantized weights alongside quantized MoE layers. It patches this by treating GDN projections as unquantized while keeping the rest on compressed-tensors NVFP4. You get a working launch script, correctness smoke tests, and concurrency ramp benchmarks.

Why is it gaining traction?

The hook is native MTP speculative decoding on Blackwell GB10 without external draft models. The pack shows 57 tokens/second for single long decodes and scales to 174 tokens/second at concurrency 4, with a 0.93 mean MTP accept rate. It proves FP4 inference works correctly on this hardware stack, which matters for anyone deploying Qwen3.6 at this scale. The included public safety scan script also signals operational awareness.

Who should use this?

ML engineers running Qwen3.6-35B on GB10 who need working NVFP4 inference with speculative decoding. Infrastructure teams evaluating Blackwell for LLM serving will find the benchmark scripts useful for capacity planning. Researchers reproducing the MTP results will want the patch and launch configuration.

Verdict

This is a niche but credible reproducibility pack for a specific hardware and model combination. The 0.85% credibility score reflects low community visibility (10 stars) and limited documentation, but the technical content is solid and the patch addresses a real loader bug. If you run Qwen3.6-35B on GB10, this saves you debugging time. Otherwise, it is too specialized to justify attention.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.