cloudflareresearch / unweight-kernels
PublicLossless compression of BF16 MLP weights for LLM inference on NVIDIA Hopper GPUs
This project offers code for compressing parts of large AI models losslessly to boost their speed on high-end NVIDIA graphics cards.
How It Works
You hear about a clever trick to shrink AI model files without losing any details, making chatbots run faster on powerful graphics cards.
Read the friendly guide and report to see how it squeezes just the right parts of the model for big speed gains.
Make sure your super-fast NVIDIA graphics card like H100 is set up with the newest tools for AI work.
Follow easy steps to build the special compression tool right on your machine.
Blend the new tool into your project that runs large AI language models.
Pick the smartest way to use it based on your model's size and needs.
Watch your model use less space and respond quicker, with no loss in smarts or accuracy.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.