caiovicentino / eoq-quantization
PublicEOQ: Entropy-Optimal Quantization for LLMs. 11-41% smaller than GGUF Q4_K_M with near-FP16 perplexity.
Implements EOQ, an entropy-optimal quantization method using absmax quantization and rANS entropy coding to compress large language model weights with minimal quality loss.
How It Works
You hear about a clever way to shrink huge AI models so they fit on everyday computers without losing smarts.
Visit the page to see impressive charts showing models 3x smaller but just as capable, with real chat examples.
Run a quick chat with a tiny AI brain that feels full-sized, amazed at how fast and smart it responds.
Download one of the pre-shrunk AI models from the links and load it up in seconds.
Start conversations with your slimmed-down AI helper, enjoying the speed and low memory use.
Compress your favorite model to pocket size and share it with friends, bringing powerful AI everywhere.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.