sd031

An agentic tool — with both a web UI and a CLI — that uses AI to diagnose and fix Kubernetes issues on any cluster (local Kind, remote EKS/GKE/AKS, bare-metal). Supports local Llama models via Ollama, AWS Bedrock Claude, and OpenAI GPT. Every troubleshooting session auto-generates a structured Markdown runbook.

10
7
69% credibility
Found May 18, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This is an AI-powered assistant that helps you diagnose and fix problems in your Kubernetes cluster—the system that runs your applications. Instead of needing to be an expert and typing complex commands, you simply describe what's wrong in plain language. The assistant then investigates your cluster, checks logs and system events, identifies the root cause, and can apply fixes with your approval. Everything is automatically documented in a runbook so you have a record of what happened and how to prevent it in the future. You can use it through a friendly web interface or command line, and connect it to various AI services.

How It Works

1
🔧 You set up the assistant

You install the tool and connect it to your AI service of choice, whether that's a local AI running on your computer or a cloud-based AI.

2
📊 You connect to your cluster

The assistant connects to your Kubernetes environment where your applications run, ready to investigate any issues.

3
💬 You describe your problem

Instead of running complex commands, you simply tell the assistant what's wrong in plain English—like 'my website is down' or 'pods keep crashing.'

4
The AI investigates your cluster
🗣
Chat mode

You have a conversation with the AI, asking follow-up questions and guiding the investigation

🚀
Scan mode

The AI performs a complete health check of your entire cluster automatically

5
🎯 The AI finds the root cause

The assistant identifies exactly what's causing your issue—whether it's a crashed container, resource shortage, or configuration problem.

6
Fixes are applied with your approval

The AI can restart services, roll back changes, or adjust settings—but always asks for your confirmation before making any changes.

📝 A runbook is created for the future

Everything is documented in a clear report showing what was wrong, what was fixed, and how to prevent it from happening again.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is AI-Powered-Kubernetes-Troubleshooting-Assistant?

This is a Python tool that puts an AI agent between you and your broken Kubernetes cluster. You describe a problem in plain English—either through a web interface or a CLI—and it autonomously runs kubectl commands, inspects pods, checks events, and applies fixes. Every troubleshooting session spits out a structured Markdown runbook so you have a paper trail for next time. It works with local Kind clusters, managed cloud offerings like EKS and GKE, and bare-metal. You can plug in local models via Ollama, Anthropic Claude through AWS Bedrock, or OpenAI GPT models.

Why is it gaining traction?

The killer feature is the runbook auto-generation. Most AI debugging tools give you a chat transcript; this one captures root cause, fix applied, verification steps, and prevention advice in a reusable document. The multi-provider flexibility means you can run it entirely offline with Ollama or go cloud-native with Bedrock. The confirmation-before-destructive-action pattern keeps it safe for production clusters while still allowing an auto-fix mode for CI pipelines.

Who should use this?

Platform engineers drowning in "pod CrashLoopBackOff" tickets will get the most value. SREs who want to document their incident response without extra effort. Small teams without dedicated Kubernetes expertise who need a guided hand through common failure modes. Not ideal for teams already invested in Datadog or New Relic's built-in troubleshooting workflows.

Verdict

With only 10 stars and a credibility score of 0.7%, this is an early-stage project that shows promise but lacks community validation. The code is well-structured and the feature set is genuinely useful, but treat it as a starting point rather than production-ready tooling. Worth watching, but audit thoroughly before pointing it at production clusters.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.