arozanov / turboquant-mlx
PublicTurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.
This project adds memory compression to AI language models on Apple hardware to boost speed and reduce resource usage.
How It Works
You learn about a handy tool that makes AI language models run faster and use less memory on Mac computers.
You easily download and install this helper alongside your AI model runner.
You pick a smart language model and get it ready to chat.
You turn on the special compression mode that shrinks the working memory smartly without hurting the AI's thinking.
You type a question or prompt, and the AI starts generating answers just like before.
Your AI responds quicker with much less memory used, letting you handle longer chats smoothly.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.