RobTand / prismaquant
PublicMixed-precision quantization for LLMs. Every layer refracts into a different format based on its sensitivity. Native compressed-tensors export, validated on Qwen3.6-35B-A3B MoE with MTP speculative decoding.
PrismaQuant shrinks large AI models by smartly choosing precision levels for each layer based on sensitivity, creating smaller files that run efficiently on standard tools.
How It Works
You hear about a smart way to shrink huge AI models so they fit on everyday computers without losing smarts.
Download the large language model you want to make smaller and point the tool to its folder.
The tool studies your model to see which parts are extra important and need more detail.
Pick how tiny you want the model—less space means room for longer chats or more models at once.
It mixes smart shortcuts per layer, keeping quality high while slashing size by up to 70%.
Load your new lightweight model and enjoy faster responses, bigger contexts, or multiple AIs sharing your hardware.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.