yaooqinn

CLI tool for querying Apache Spark History Server REST API

10
1
100% credibility
Found Mar 19, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A user-friendly tool for browsing and analyzing the history of past Apache Spark data processing jobs, including applications, jobs, stages, executors, SQL queries, and event logs.

How It Works

1
🔍 Discover the tool

You learn about a handy way to review your past data processing runs when troubleshooting slow jobs.

2
📥 Set it up

You easily add the tool to your computer so it's ready to use.

3
🔗 Connect to history

You tell the tool where to find your collection of past job records.

4
📋 See your past runs

A list of all your completed jobs appears, and you pick one to explore.

5
Choose your way
🚀
Quick check

Ask for specific details like jobs or logs in one go.

💬
Interactive browse

Chat with the tool step by step, jumping between jobs, steps, and workers.

6
📊 Dig into details

You uncover what happened in jobs, steps, workers, and even download full records.

🎉 Unlock insights

You spot slowdowns, fix issues, and plan faster data jobs ahead.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is spark-history-cli?

This Python CLI tool queries the Apache Spark History Server REST API straight from your terminal, pulling details on apps, jobs, stages, executors, SQL executions, RDDs, and configs without opening a browser. Fire up REPL mode for interactive digging—like listing completed apps, switching contexts with `use `, or downloading event logs as ZIP—or run one-shot commands like `spark-history-cli apps --status completed` with JSON output for scripts. It covers all 20 API endpoints, works on Linux, Ubuntu, Mac, or Windows.

Why is it gaining traction?

Unlike the clunky web UI, it offers a snappy REPL with persistent app context, pretty tables for metrics, and seamless log downloads—perfect for quick postmortem analysis. JSON mode feeds data into pipelines or AI tools, and the Copilot CLI skill install (`spark-history-cli install-skill`) lets agents invoke it via prompts. As a lightweight pip-installable CLI GitHub repo, it beats manual API calls or heavier Spark clients.

Who should use this?

Data engineers debugging failed Spark jobs on clusters, Spark devs inspecting stages/executors post-run, or DevOps scripting app monitoring via GitHub Actions. Ideal for Linux/Ubuntu CLI tools users tired of UI hunting, or those piping output to tools on Mac/Windows.

Verdict

Grab it if you live in Spark—solid docs and full API coverage make it useful now, despite 10 stars and 1.0% credibility signaling early beta maturity. Test on a local History Server before prod; pair with GitHub Copilot for AI-assisted queries.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.