noonghunna

Community recipes for serving LLMs on RTX 3090. Multi-engine (vLLM, llama.cpp, SGLang) and model-agnostic. Currently shipping Qwen3.6-27B configs for 1× and 2× cards.

15
1
100% credibility
Found Apr 28, 2026 at 15 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Shell
AI Summary

A collection of validated configurations and recipes for running large vision-and-tools-enabled language models locally on one or two RTX 3090 GPUs via OpenAI-compatible servers.

How It Works

1
🔍 Discover Home AI Power

You hear about a way to run powerful AI chatbots right on your gaming computer with RTX 3090 graphics cards, without needing the internet.

2
📥 Grab the Setup Guide

Download the simple recipe folder that has everything you need to get started at home.

3
🧠 Download the AI Brain

Fetch the smart AI model files (about 20GB) that your computer will use to think and chat.

4
Pick Your Setup
🖥️
Single Card

Perfect for one RTX 3090 – quick setup for personal chatting and tools.

💻
Dual Cards

Use two RTX 3090s for faster responses and handling bigger conversations.

5
🚀 Launch Your AI Server

Start everything with one easy command – watch it boot up in a couple minutes and get ready to chat.

6
💬 Test Your AI

Send a message like 'What's the capital of France?' and see it reply instantly, just like online services.

🎉 Enjoy Private Super AI

Now you have a full-featured AI assistant at home – chat, analyze images, use tools, all fast and private on your own hardware.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 15 to 15 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is club-3090?

Club-3090 delivers community recipes for running modern LLMs like Qwen3.6-27B on single or dual RTX 3090s, using Shell scripts and Docker Compose setups with engines like vLLM, llama.cpp, and SGLang. It spins up an OpenAI-compatible API at localhost:8020 supporting chat, vision, tools, streaming, and up to 262K context—perfect for homelabs or dev backends. Download a model, run setup.sh, docker compose up, and test with curl in minutes.

Why is it gaining traction?

These configs are battle-tested on 24GB consumer GPUs, dodging common OOMs and bugs with patches, benchmarks (51-89 TPS), and verify scripts—far beyond generic engine docs. Multi-card support works PCIe-only, no NVLink needed, and model-agnostic design scales to new quants. Devs love the drop-in API for swapping cloud LLMs locally, plus community github scripts vibe like a 3090 owners club sharing reliable dinner recipes.

Who should use this?

Homelab operators with one or two 3090s building local AI agents, RAG pipelines, or code assistants. Devs prototyping LLM backends without AWS bills, especially those hitting context or throughput walls on vLLM/llama.cpp. RTX 3090 holders in the community recipes scene tired of tweaking configs manually.

Verdict

Grab it if you have 3090s—docs, benches, and scripts make it production-ready despite 15 stars and 1.0% credibility score signaling early days. Low maturity means watch for upstream breaks, but verification suite catches them fast. Solid for local serving today.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.