inclusionAI / cuLA
PublicCUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
cuLA is a Python library offering high-performance CUDA kernels for linear attention variants, designed as a drop-in replacement for flash-linear-attention to accelerate long-context language model workloads on NVIDIA Hopper and Blackwell GPUs.
How It Works
You hear about cuLA, a tool that speeds up AI models handling long conversations by making attention calculations faster on powerful NVIDIA GPUs.
Download the free library and install it alongside your existing AI tools with a few simple commands.
Change just one line in your code to use cuLA's faster building blocks instead of the old ones.
Hit run on your AI project, and watch it process longer texts much quicker.
Run quick tests to confirm your model is now blazing fast with real numbers.
Your long-context AI models train and run smoother and quicker, ready for bigger challenges.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.