Rahul-14507 / MELLM

Public

Lightweight Modular AI Routing Engine for Local LLMs — Run specialised experts efficiently on consumer GPUs using smart Mixture-of-Experts routing.

ai-router consumer-gpu gguf llama-cpp local-llm

100% credibility

Found Mar 22, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

MELLM is a local AI system that smartly directs questions to specialized small models for topics like code, math, medicine, law, and general knowledge to deliver better results with everyday computer power.

How It Works

🔍 Discover MELLM

You hear about this clever AI helper that picks the perfect expert for your questions, like coding or math, without needing fancy hardware.

📥 Bring it home

Download the files to your computer and start the friendly setup guide that checks what your machine can handle.

🧙 Choose your team

The guide shows easy picks for smart helpers in areas like code, math, medicine, law, and everyday topics, tailored just for you.

🚀 Start the chat

Open the chat screen and watch the main thinker wake up, ready to sort your questions smartly.

💬 Ask your question

Type something like 'Help me code this' or 'What's wrong with my health issue?' and feel the magic as it grabs the right expert.

⚡ Enjoy quick experts

Get spot-on answers fast, and follow-up questions zip along even quicker since the expert stays ready.

🎉 Expert AI at home

Celebrate having your own team of specialized thinkers working right on your computer, smarter and simpler than ever.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 16 to 16 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is MELLM?

MELLM is a Python-based routing engine that directs user queries to lightweight modular specialist LLMs—like code, math, medical, legal, or general—using a tiny persistent router model. Instead of loading massive generalist models that eat VRAM, it swaps in one small expert at a time via llama-cpp-python and GGUF files, delivering precise answers on consumer NVIDIA GPUs down to 6GB. Developers get a CLI for interactive sessions with efficiency dashboards and model preloading, plus a FastAPI REST endpoint for /query integration.

Why is it gaining traction?

It crushes VRAM limits others ignore, running domain-tuned 1.5-7B models where 70B monoliths fail, with hot caching for instant same-domain follow-ups and multi-domain decomposition for complex asks. The setup wizard auto-detects hardware, and auto-downloads from Hugging Face make it plug-and-play, unlike heavyweight github lightweight alternatives. Streaming responses and session stats hook tinkerers tired of slow local inference.

Who should use this?

Local AI experimenters on laptops with GTX 1060 or RTX 3050 GPUs building chat apps or prototypes. Backend devs needing a lightweight modular backend for domain-specific queries without cloud costs. Hobbyists prototyping mixture-of-experts routing before scaling to production.

Verdict

Grab it if you're on modest hardware chasing efficient local LLMs—docs are thorough, benchmarks transparent, and extensibility shines for custom specialists. At 16 stars and 1.0% credibility, it's raw but functional; test on non-critical projects first.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 16 stars

Bonus: AI verified quality (100%)

Account age: 855 days

Repo age: 4 days

License: MIT

Updated: Mar 22, 2026