helgklaizar / turboquant_mlx
PublicExtreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon (MLX). Features TurboQuant, asymmetric PolarQuant caching, and OpenAI server compatibility.
TurboQuant-MLX compresses the memory cache for language models on Apple Silicon to enable longer contexts and larger models with minimal accuracy loss.
How It Works
You hear about a handy tool that lets big AI chatbots run smoothly on your Mac without eating up all the memory.
Download the tool and set it up on your Apple computer in a few simple steps.
Pick your favorite AI model and flip on the memory-saving switch to make it use way less space while keeping responses sharp.
Ask the AI long questions or have extended conversations, and watch it handle huge amounts of text without slowing down.
Keep generating ideas and stories right on your Mac.
Launch a server so you can chat through web tools or apps.
Your Mac now runs massive AI sessions effortlessly, saving gigabytes of space and keeping everything fast and accurate.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.