zimingttkx / QuantumFlow
PublicQuantumFlow - Distributed LLM inference scheduling framework with multi-backend support (vLLM, TGI, SGLang), adaptive scheduling strategies, and cluster management.
QuantumFlow is a distributed AI inference platform that lets you run large language models across multiple computers with graphics cards. It provides a central server to manage requests, worker nodes that actually run the AI models, and tools to load models, monitor performance, and interact with AI through chat or text generation. The system automatically distributes work across available computers, making AI responses faster and more efficient.
How It Works
You hear about a platform that lets you run AI language models across multiple computers, making them faster and more powerful.
With one simple command, you launch the brain of the system that will manage all your AI requests and coordinate your workers.
You connect additional machines with graphics cards to your network, and they automatically join the team to share the AI workload.
You choose a language model and watch as it automatically downloads and prepares itself on your connected computers.
You open the interactive terminal, pick your model, and start having conversations or generating text.
The AI responds with intelligent text, and your distributed system has handled everything smoothly behind the scenes.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.