loveSunning / FastCuda
PublicFastCuda is a handwritten CUDA operator library featuring progressive GEMM and Reduce kernels, cuBLAS benchmarking, and C/C++/Python interfaces for learning, profiling, and performance optimization.
FastCuda is a library of hand-optimized routines for fast matrix multiplication and data reduction on NVIDIA GPUs, with examples, benchmarks, and easy connections to C, C++, or Python.
How It Works
You hear about FastCuda, a handy toolkit that makes your NVIDIA graphics card crunch huge math problems like matrix multiplies super fast.
Download the project files to your Windows or Linux computer to start using it.
Check that your graphics card setup is ready by following the simple requirements list.
Run the easy build steps to create the fast math engines tailored for your card.
Launch a sample calculation, like multiplying big grids of numbers, and watch it zoom.
Optionally connect it to Python to do speedy sums or multiplies right from your scripts.
Run built-in tests to compare and see how much quicker your results come back.
Now your heavy number-crunching tasks fly on your graphics card, saving time and power!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.