Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)
An educational project creating a highly optimized custom program to run the Qwen3-0.6B AI model much faster on NVIDIA GPUs like the RTX 3090.
How It Works
You hear about a fun project that makes small AI chatbots run super fast on your gaming computer.
Download the files and set up the simple tools it needs, like adding a special helper for your computer's graphics card.
Run the chat program and start talking to the AI assistant right away.
Watch in amazement as the AI replies almost instantly, much faster than usual apps.
Try the built-in tests to see exactly how quick it is compared to others.
Run a quick check to make sure the answers match what other AIs give.
Enjoy your own blazing-fast AI helper for chatting, learning, or fun, all running smoothly on your setup.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.