patilyashvardhan2002-byte / lazy-moe
PublicThe GPU-free LLM inference engine. Combines lazy expert loading + TurboQuant KV compression to run models that shouldn't fit on your hardware. Built from scratch, fully local, zero cloud.
LazyMoE enables running large AI language models on low-RAM devices like those with 8GB without a GPU by smartly loading model parts and compressing memory use.
How It Works
You find a cool tool that lets everyday laptops chat with huge smart AIs without fancy hardware.
Download the program and prepare it with a few easy steps like adding helper tools.
Choose and grab a smart AI model file that matches your computer's memory size.
Run the simple starter script to wake up the backend and open the front screen.
Your browser shows a futuristic control panel with system stats and ready-to-go AI power.
Click the system button to see which big AIs run great on your exact hardware.
Type in a query, watch it analyze, load brain parts, and stream back clever answers.
You now chat with powerful AIs smoothly on your regular computer, saving time and power.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.