jjang-ai / jangq

Public

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

jangq.ai gguf llamacpp llm mlx mlxllm

100% credibility

Found Mar 23, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

JANGQ provides tools to compress large AI models for fast, high-quality performance on Apple Silicon Macs using the MLX framework.

How It Works

🔍 Discover JANG

You learn about a way to run massive smart AI models right on your Mac without fancy hardware.

💻 Grab the tools

Install the free helper software in seconds so you can start playing with big AIs.

🧠 Pick a brainy model

Choose from ready-made super-smart models that think deeply and chat naturally.

⚡ Shrink and speed it up

With one easy command, transform the huge model into a fast, memory-friendly version that flies on your Mac.

Start chatting

📱

Use the app

Open MLX Studio and talk to your AI like a friend, with lightning replies.

💻

In your code

Drop it into your programs for custom smart helpers.

🎉 AI superpowers unlocked

You now have a blazing-fast, genius-level AI companion running smoothly on your everyday Mac.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 16 to 16 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is jangq?

Jangq delivers adaptive mixed-precision quantization for LLMs on Apple Silicon, turning any HuggingFace model into JANG_Q format—the GGUF for MLX. Python CLI like `jang convert Qwen/Qwen3.5-397B -p JANG_2L` spits out sub-4bit quants that load instantly via safetensors and run at full Metal speed with jang[mlx] loaders. Solves MLX's MoE failures (NaN, crashes, random output) by keeping models quantized in GPU memory.

Why is it gaining traction?

Crushes MLX on quality-size-speed: 86% MMLU at 112GB for 397B (MLX can't), 5x prefill boost, fits Nemotron-Cascade in 10GB on 16GB Macs. Eric Jang GitHub smarts shine in runtime that auto-detects bfloat16 and reasoning tags like ``. Pre-quant jungle on HF means zero conversion hassle.

Who should use this?

Apple Silicon devs quantizing massive MoE beasts (Qwen3.5-397B, Nemotron-Super-120B, MiniMax) for local eval/inference. MLX app builders needing loaders for mixed-precision models. Christine Jang GitHub watchers chasing must-have runtime edges on Silicon.

Verdict

Strong pick for MoE on Macs—transformative perf where MLX flops, despite 16 stars and 1.0% credibility signaling early days. Python tools and HF integration make it dead simple; benchmark your models first.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 16 stars

Bonus: AI verified quality (100%)

Account age: 993 days

Repo age: 7 days

Updated: Mar 23, 2026