reacher-z

Lightweight NVIDIA GPU monitor — alerts on Slack/Discord/Telegram/20 channels when training crashes or GPU overheats. Zero dependencies. Single file.

47
0
100% credibility
Found Mar 08, 2026 at 30 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A lightweight monitor for NVIDIA GPUs that detects crashes, overheating, memory leaks, and other issues, sending instant alerts via 20+ messaging channels.

How It Works

1
😩 Training crashes overnight

You wake up to find your long-running project stopped without warning, wasting hours of computer power.

2
Discover gpu-monitor

You learn about this simple tool that watches your graphics cards and alerts you instantly to problems.

3
📦 Grab the tool

Download or install it in seconds with a single easy command—no complicated setup needed.

4
🔗 Connect your messenger

Link it to your favorite chat app like Slack, Discord, or phone notifications so alerts reach you anywhere.

5
🚀 Turn it on

Launch the watcher, and it quietly runs in the background, keeping an eye on temperatures, memory, and activity.

6
📱 Get smart alerts

Your phone buzzes right away if cards overheat, run out of memory, or suddenly stop working.

🎉 Save time and power

You fix issues fast, restart before big losses, and keep your projects running smoothly.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 30 to 47 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is gpu-monitor?

gpu-monitor is a single-file Python tool for NVIDIA GPU monitoring on Linux, alerting you via Slack, Discord, Telegram, or 18 other channels when training crashes, GPUs overheat, memory leaks, or ECC errors strike. It polls nvidia-smi to detect idle GPUs, OOM risks, and fan failures, preventing wasted compute on overnight jobs. Pip install gpuwatch, set a webhook, run gpu-monitor—zero dependencies, instant coverage for gpu monitor linux on Ubuntu or Mint.

Why is it gaining traction?

Unlike gpustat or nvitop's interactive views, it runs silently in background with crash detection and 20-channel fanout, plus Prometheus metrics, InfluxDB export, and Kubernetes DaemonSets. CLI shines: --watch for live tables, --web for sparklines dashboard, --test-notify to verify alerts. Free GitHub Pages multi-machine view beats paid dashboards, hooking devs tired of manual checks in gpu monitoring tools.

Who should use this?

ML researchers firing PyTorch runs at 3AM on local clusters, lab admins juggling shared A100s across users, MLOps folks wiring gpu monitor github into prod pipelines. Suits self-hosting LLMs on gpu monitor ubuntu rigs or lightweight discord github alerts for remote teams—no bloat for cpu gpu monitor github needs.

Verdict

Strong pick for unsupervised GPU workloads; deploy via Docker or systemd, integrate Grafana in minutes. 19 stars and 1.0% credibility signal early maturity—docs excel, but probe with --once first before clusters.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.