alicankiraz1

TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.

49
4
89% credibility
Found Mar 30, 2026 at 59 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A research preview tool that compresses memory for AI language models to run more efficiently on Apple Silicon Macs, focused on specific models like Qwen3 and Qwen3.5.

How It Works

1
πŸ” Hear about a handy tool

You stumble upon this project while looking for ways to make AI chats run smoother and use less memory on your Mac.

2
πŸ“₯ Grab the ready-to-use bundle

Download the clean package and set it up quickly on your computer without any hassle.

3
πŸ€– Pick your AI buddy

Choose a smart language model like Qwen that you already have or easily get, and connect it to the tool.

4
πŸ’¬ Start a conversation

Type in a question or message, and let the tool generate a helpful reply just like normal.

5
⚑ Feel the speed boost

Watch as responses come faster and your Mac uses way less memory, making long chats a breeze.

πŸŽ‰ Chat smarter, longer

Now you can enjoy endless AI conversations that feel quick and light on your machine.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 59 to 49 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Qwen3.5-TurboQuant-MLX-LM?

This Python package delivers TurboMLX v0.1, a research preview public source tree for qwen3.5-focused TurboQuant experiments on MLX. It lets you quantize KV caches in full-attention Qwen3/Qwen3.5 models during inference, slashing memory use via paper-faithful mse and prod modes. Users get a CLI (turbomlx) for generation, benchmarking, and evals like needle-in-haystack or perplexity, plus API calls like generate_with_backend and prompt cache save/load.

Why is it gaining traction?

It patches MLX-LM seamlessly for TurboQuant backends, showing 30-60% key-path memory cuts over oracle previews in benchmarks, with dense values by default. Native MLX scorers emerge for 4-bit mse on Qwen3.5, plus mixed-precision profiles and honest scorer-route reporting. Devs dig the verifiable quality gates, JSONL evals, and continuity with existing MLX prompt caches.

Who should use this?

MLX-LM users running long-context Qwen3.5 inference on Apple Silicon, especially those hitting KV memory walls. Researchers prototyping KV quantization experiments or benchmarking against mlx_quant. Apple devs optimizing 9B+ Qwen models for edge deployment.

Verdict

Grab this v0.1 research preview tree if TurboQuant on qwen3.5+MLX fits your stackβ€”it's correct-first with 72/73 passing tests and solid docs. 49 stars and 0.8999999761581421% credibility signal early promise; expect throughput gains post-v1.0.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.