phuongncn

4-5x faster Qwen3.5 on ASUS GX10 / DGX Spark — Hybrid INT4+FP8 + MTP via one shell script

19
1
89% credibility
Found Apr 20, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Shell
AI Summary

Scripts to set up and run optimized versions of Qwen3.5 AI models for 4-5x faster performance on ASUS GX10 or DGX Spark computers.

How It Works

1
🔍 Discover the speed trick

You hear about a free tool that makes AI chats on your fancy ASUS GX10 computer run 4-5 times faster, turning slow responses into lightning-quick ones.

2
📥 Get the helper tool

Download the simple setup folder from the internet and make its main file ready to run with a quick permission change.

3
⚙️ Run the easy menu

Open the tool's menu and pick 'install' to let it prepare your chosen AI brain automatically.

4
Pick your style
Speed mode

Go for the hybrid setup that squeezes max speed from one request at a time.

Quality mode

Select native setup for perfect answers and handling many chats together.

5
🚀 Start your turbo AI

Pick your prepared brain, hit start, and watch it load up in minutes for instant use.

6
📊 Check the magic

Run the built-in speed test to see numbers like 100+ words per second zoom by.

🎉 Chat at warp speed

Now your AI helper answers questions, writes code, or crunches math in a flash, making your powerful computer feel truly alive.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is asus-gx10-qwen35-speed-hack?

This shell script turns your ASUS GX10 or DGX Spark into a Qwen3.5 speed demon, delivering 4-5x faster inference via hybrid INT4+FP8 checkpoints and MTP speculative decoding. It solves the pain of slow stock runs—30 t/s on 35B jumps to 112+ t/s single-request or 185 t/s concurrent—through one interactive menu that builds Docker images, downloads models, merges weights, and spins up an OpenAI-compatible vLLM server. Pick hybrid mode for raw speed or native FP8 for quality and throughput; benchmark results are built-in.

Why is it gaining traction?

Unlike Ollama or manual llama.cpp compiles that cap at 70 t/s max, this hack squeezes Blackwell's 128GB unified memory for real-world wins: 51 t/s on 122B, custom AutoRound support. Developers love the zero-expertise flow—run the shell script, select options, get API endpoints ready—no framework wrestling. Proven benchmarks across tasks like code gen and math hook hardware owners chasing local perf parity with clusters.

Who should use this?

ASUS GX10 or DGX Spark buyers running Qwen3.5 for local RAG, code assistants, or internal chatbots. AI devs benchmarking MoE models on single high-VRAM GPUs. Teams dodging cloud bills for dev/test inference at 100+ t/s.

Verdict

Strong pick for matching hardware—installs fast, benchmarks deliver. 19 stars and 0.9% credibility score signal early maturity; solid README but validate stability for prod.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.