r0b0tlab

MiniMax M2.7 NVFP4 dual-GB10 Blackwell benchmark: vLLM FlashInfer-CUTLASS, public data, HTML canvas report, and Docker runtime.

10
0
85% credibility
Found May 24, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
HTML
AI Summary

This is a benchmark and reproducibility project for running the MiniMax M2.7 AI model with NVIDIA's special 4-bit precision format on dual NVIDIA GB10 Blackwell GPUs. The project provides a Docker container, optimized launch scripts, and benchmark results showing around 25 tokens per second performance. It includes a safety checker script to ensure no secrets are accidentally published, and the code is released under MIT license. The project is aimed at researchers and developers who want to run this specific AI model on high-end NVIDIA hardware.

How It Works

1
🔍 You hear about a faster way to run AI

You discover a project that shows how to run the MiniMax AI model at high speed on powerful NVIDIA GPUs.

2
🖥️ You check if your hardware is ready

The project tells you that you need dual NVIDIA GB10 GPUs (like a DGX Spark) with enough memory to hold the model.

3
📦 You get the model and container

You download the official MiniMax model from NVIDIA (after accepting their license) and pull the ready-to-run container.

4
🚀 You launch your AI assistant

With one script, your AI assistant starts up using special 4-bit precision that fits in your GPUs and runs incredibly fast.

5
📊 You see your performance results

The benchmark shows your setup running at about 25 tokens per second, slightly faster than the public baseline.

Your AI is running smoothly

Your MiniMax assistant is now running on your Blackwell GPUs, ready to help with reasoning and tool-calling tasks.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is minimax-m27-nvfp4-gb10-benchmark?

This is a reproducible benchmark bundle and Docker runtime for running MiniMax M2.7, a large Mixture-of-Experts language model, with NVIDIA's NVFP4 4-bit quantization on dual Blackwell GB10 GPUs. It uses vLLM with a FlashInfer-CUTLASS backend and tensor parallelism across two GPUs to squeeze out maximum throughput. The repo includes launch scripts, an HTML canvas benchmark report, and a pre-built container image on GitHub Container Registry. The headline result: about 25 tokens per second at tg128, roughly 2.7% faster than the public baseline.

Why is it gaining traction?

The NVIDIA Blackwell architecture is new territory, and this repo tackles a real pain point: CUDA graph replay deadlocks the cross-node SHM broadcast path during TP2 inference. The solution is compile-only mode with torch.compile, keeping Inductor optimizations without graph replay. That is a genuine operational insight that other teams running similar setups will appreciate. The Docker image handles the complex vLLM build with FP4 support baked in, which saves hours of debugging for teams with Blackwell hardware.

Who should use this?

ML engineers running MiniMax M2.7 on dual GB10 or DGX Spark systems who need reproducible throughput numbers. Researchers benchmarking MoE models on cutting-edge GPU hardware. DevOps teams deploying quantized LLMs via containerized runtimes. If you are not already working with Blackwell SM120/SM121 hardware and the NVFP4 checkpoint, this repo will not be useful to you.

Verdict

The credibility score of 0.85% reflects a niche, single-author repo with only 10 stars. The technical execution is solid, the documentation is clear, and the benchmark methodology is transparent. However, this is a highly specialized tool targeting a very specific hardware configuration. If you have dual GB10 Blackwell systems and need to run MiniMax M2.7 with NVFP4, this is the reference implementation you want. Otherwise, skip it.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.