jia-gao

htop for GPU pods on Kubernetes — per-pod GPU utilization, memory, temperature, power, and waste detection

48
10
100% credibility
Found Apr 13, 2026 at 48 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Go
AI Summary

A command-line tool that shows real-time GPU utilization, memory usage, temperature, power draw, and pod attribution across Kubernetes cluster nodes, plus idle GPU waste detection with cost estimates.

How It Works

1
🕵️ Discover the tool

You hear about a helpful gadget for checking how your powerful graphics cards are being used across your group of computers.

2
🚀 Add monitoring helpers

You follow easy steps to place small watchers on every computer that has a graphics card, so they can report back usage details.

3
📥 Get the viewer command

You download and add a simple command to your toolbox that lets you peek at all the graphics card info.

4
👀 See live graphics stats

You run the command and get a clear table showing usage, memory, temperature, power for each card and which program is using it.

5
💡 Check for waste

You run a special check to spot idle graphics cards and see estimates of how much money they are wasting by sitting unused.

🎉 Optimize your setup

Now you can make sure every graphics card is working hard, save money, and keep everything running smoothly.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 48 to 48 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is kube-gpu-top?

kube-gpu-top brings htop-style monitoring to Kubernetes GPU clusters, showing per-pod utilization, memory, temperature, power, and waste detection across all nodes. Run `kubectl gpu-top` for a live table of NODE, POD, GPU model, UTIL%, MEM used/total, TEMP, and POWER, or `kubectl gpu-top waste` to sample over time and flag idle GPUs with USD/hour estimates. Built in Go, it deploys a lightweight agent DaemonSet on GPU nodes and works via Krew install for a seamless htop gpu nvidia linux experience.

Why is it gaining traction?

Unlike basic nvidia-smi or scattered htop alternatives github tools, this github htop dev plugin ties metrics directly to pods for true cluster accountability, plus unique waste detection that quantifies low-utilization htop gpu usage, htop gpu temp, and htop gpu memory burn. No htop config github tweaks needed—query via gRPC agents, filter by namespace, and get htop gpu ubuntu-style tables instantly. It's a neo htop github for AI/ML ops, filling the kubectl top gap.

Who should use this?

Kubernetes admins debugging GPU sharing in multi-tenant clusters, ML engineers hunting idle A100s/H100s during training runs, or SREs optimizing costs on NVIDIA-heavy Ubuntu nodes. Perfect for teams needing htop gpu linux without node logins.

Verdict

Solid early pick at 48 stars and 1.0% credibility score—docs shine, CLI builds cross-platform (linux/darwin amd64/arm64), tests pass, but low adoption means watch for edge cases like AMD/Intel support. Install the agent and CLI if GPUs are your bottleneck; it'll save hours versus manual checks.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.