jjang-ai

jjang-ai / jangq

Public

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

16
2
100% credibility
Found Mar 23, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

JANGQ provides tools to compress large AI models for fast, high-quality performance on Apple Silicon Macs using the MLX framework.

How It Works

1
🔍 Discover JANG

You learn about a way to run massive smart AI models right on your Mac without fancy hardware.

2
💻 Grab the tools

Install the free helper software in seconds so you can start playing with big AIs.

3
🧠 Pick a brainy model

Choose from ready-made super-smart models that think deeply and chat naturally.

4
Shrink and speed it up

With one easy command, transform the huge model into a fast, memory-friendly version that flies on your Mac.

5
Start chatting
📱
Use the app

Open MLX Studio and talk to your AI like a friend, with lightning replies.

💻
In your code

Drop it into your programs for custom smart helpers.

🎉 AI superpowers unlocked

You now have a blazing-fast, genius-level AI companion running smoothly on your everyday Mac.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 16 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is jangq?

Jangq delivers adaptive mixed-precision quantization for LLMs on Apple Silicon, turning any HuggingFace model into JANG_Q format—the GGUF for MLX. Python CLI like `jang convert Qwen/Qwen3.5-397B -p JANG_2L` spits out sub-4bit quants that load instantly via safetensors and run at full Metal speed with jang[mlx] loaders. Solves MLX's MoE failures (NaN, crashes, random output) by keeping models quantized in GPU memory.

Why is it gaining traction?

Crushes MLX on quality-size-speed: 86% MMLU at 112GB for 397B (MLX can't), 5x prefill boost, fits Nemotron-Cascade in 10GB on 16GB Macs. Eric Jang GitHub smarts shine in runtime that auto-detects bfloat16 and reasoning tags like ``. Pre-quant jungle on HF means zero conversion hassle.

Who should use this?

Apple Silicon devs quantizing massive MoE beasts (Qwen3.5-397B, Nemotron-Super-120B, MiniMax) for local eval/inference. MLX app builders needing loaders for mixed-precision models. Christine Jang GitHub watchers chasing must-have runtime edges on Silicon.

Verdict

Strong pick for MoE on Macs—transformative perf where MLX flops, despite 16 stars and 1.0% credibility signaling early days. Python tools and HF integration make it dead simple; benchmark your models first.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.