caiovicentino / polarengine-vllm
PublicPolarEngine: vLLM plugin for PolarQuant quantized LLM inference — 75% FP16 speed at 2.3x less VRAM
PolarEngine provides a quantization plugin for vLLM enabling efficient inference of large language models with near-lossless compression via Walsh-Hadamard rotation and optimal centroids.
How It Works
You hear about a way to run huge AI language models on everyday computers without needing massive hardware.
With one simple command, you add the tool to your setup so your AI can use smart compression.
Grab a compressed model that's already optimized and perfect for chatting.
Turn your chosen AI model into a lightweight version that fits anywhere.
Your AI model loads super fast using less memory, ready to respond in seconds.
Ask questions, generate text, or serve it up for friends to use just like magic.
Enjoy lightning-fast responses from giant models on your home setup, saving power and space.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.