hang-in

hang-in / tunaLlama

Public

tunaLlama: using local llm on claude, codex

15
0
85% credibility
Found May 17, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

tunaLlama is a smart assistant for people who use AI coding tools like Claude Code or Codex CLI. It acts like a helpful middle manager: when you ask for code, it lets your main AI focus on planning and reviewing while delegating the actual code writing to a local or cheaper AI model you control. This saves your subscription credits for the high-value work. The tool also keeps a memory of all your coding tasks, so it can recall past solutions and follow your project's conventions. It works with local AI services like Ollama or LM Studio, supports Korean language, and runs as a plugin that integrates directly into your coding assistant.

How It Works

1
💡 You discover you're burning through credits too fast

While using an AI coding assistant, you notice your monthly bill is climbing because every coding task consumes tokens—even the simple, repetitive ones.

2
🛠️ You connect your own AI model to handle the heavy lifting

You install tunaLlama and link it to a local AI model you already have running on your computer, or a cheaper cloud service you prefer.

3
Your AI assistant learns to delegate

Now when you ask for code, your assistant automatically splits the work: it plans the approach itself, then hands off the bulk of the coding to your local AI.

4
Different tasks flow differently
Quick tasks

You ask for a JSON parser, and your assistant immediately delegates to your local AI with the right context

📋
Bigger projects

You write a brief spec document, and your assistant runs a generate-review-fix loop until the code is solid

5
🔍 Your assistant reviews and verifies everything

The local AI generates code, your main assistant checks it for bugs and correctness, and if something's wrong, it asks the local AI to fix it automatically.

6
🧠 Your project develops a memory

Every delegation call is saved. Your assistant remembers your coding style, conventions, and past decisions—so future work fits naturally with what you've already built.

🎉 You get quality code while keeping costs under control

The heavy lifting happens on your own machine or cheap service, while your main assistant focuses on what it does best: planning, reviewing, and making sure everything works together.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 15 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is tunaLlama?

tunaLlama is a delegation runtime for Claude Code and Codex CLI that hands heavy code generation off to a local or low-cost LLM while keeping the expensive cloud model focused on decomposition, verification, and integration. Built in Python, it works as an MCP plugin that intercepts code tasks and routes them to Ollama, LM Studio, or Ollama Cloud, then feeds the results back to your Claude or Codex session for final review. The system maintains a SQLite-backed memory layer that tracks past delegations, making it easy to recall how similar problems were solved before.

Why is it gaining traction?

The core hook is token savings without sacrificing quality. Claude Code Pro and Codex CLI users burning through quotas on repetitive code generation now have a way to offload that work to free local models while their premium session handles the architect role. The memory layer is surprisingly sophisticated for a v0.5 project: Korean morphological tokenization via Kiwi, hybrid BM25 plus vector search with BGE-M3 embeddings, and automatic extraction of conventions and constraints from delegation outputs. The developer clearly dogfooded this heavily before shipping, which shows in the 507 tests and 90% coverage.

Who should use this?

Claude Code Pro or Max subscribers watching their token quotas, Codex CLI users managing API costs, and anyone with a local Ollama or LM Studio setup who wants to keep their premium cloud model for high-value work. Korean-speaking developers get extra value since the memory search handles Korean morphemes natively. If you do not have a local LLM running, this is not for you.

Verdict

At 15 stars with a 0.85% credibility score, this is a niche tool for a specific audience, but the test coverage and measured search quality data suggest the author is serious about correctness. Worth trying if you match the use case, but treat it as usable beta rather than production-stable.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.