psmarter / CUDA-Practice
PublicCUDA编程练习项目-Hands-on CUDA kernels and performance optimization, covering GEMM, FlashAttention, Tensor Cores, CUTLASS, quantization, KV cache, NCCL, and profiling.
A hands-on tutorial series with code examples, detailed explanations, and performance benchmarks teaching GPU acceleration techniques from beginner basics to advanced optimizations.
How It Works
You find a welcoming collection of lessons that teach how to make computers handle huge tasks super quickly.
You follow easy starting steps to understand speeding up everyday number crunching on special fast hardware.
You run your first example and feel thrilled as ordinary work finishes hundreds of times faster than before.
You dive into clever methods for big math like AI brains and watch your creations get even snappier.
You measure results side-by-side and smile at the huge improvements you've unlocked.
You now confidently build lightning-quick programs that solve massive problems effortlessly.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.