ManoManoTech / bucket-scrapper

Public

High-performance S3 bucket content searcher. Downloads compressed objects, stream-decompresses line-by-line, filters by regex, and outputs to local zstd files or an HTTP API.

100% credibility

Found Mar 09, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Rust

AI Summary

A command-line tool for streaming through compressed log files in cloud storage buckets, filtering lines by patterns over date ranges, and saving matches to files or sending to an HTTP service.

How It Works

🔍 Need to hunt for errors in old logs?

You have giant zipped log files stored online in cloud folders and want to quickly find specific problems like errors without pulling down everything.

📝 Share your cloud folder details

You point it to your online storage areas and describe how your logs are sorted by dates and hours.

📅 Choose your search time window

You pick the start and end times, like scanning logs from last Tuesday morning until evening.

🔤 Type in what to look for

You enter words or patterns to match, like 'ERROR timeout', and decide if case matters.

Pick where to send the matches

💾

Save to your computer

Matching lines get saved into neat zipped files grouped by time, ready to open anytime.

📤

Send straight online

Matches stream right to your online log viewer or service automatically.

🚀 It races through the files

You watch live progress as it zips through huge files, pulling out only what matches, super speedy and smart.

✅ Treasure of filtered logs

You get exactly the key log lines you needed, fast and without the hassle, perfect for digging into issues.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 10 to 12 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is bucket-scrapper?

Bucket-scrapper is a Rust high performance bucket scraper for searching compressed content in S3 buckets. Point it at buckets with date-partitioned paths like dt=YYYYMMDD/hour=HH, filter keys and lines by regex, and it streams downloads, decompresses gz/zst files line-by-line, then outputs matches to local zstd files or an HTTP API. No full-object buffering means it handles TB-scale logs without exploding memory.

Why is it gaining traction?

Rust high performance github projects like this shine with 32-way concurrent S3 downloads, ripgrep-speed regex filtering, and AIMD throttling for HTTP endpoints that back off on 429s. Developers grab it for progress reports showing MB/s throughput and bottlenecks, plus range-resume retries—faster than grep over aws s3 cp or jq hacks.

Who should use this?

SREs sifting ERROR lines from hourly JSON.zst logs in support buckets. Data engineers bulk-exporting filtered content from archived S3 files before loading to high performance mysql github pipelines or Spark jobs.

Verdict

Promising at 10 stars and 1.0% credibility score—CLI flags like --max-parallel 64 and config YAML cover real workflows, docs detail tuning. Test on your buckets; it's mature enough for scripts but needs stars for long-term trust.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 12 stars

Bonus: AI verified quality (100%)

Account age: 3,344 days

Repo age: 8 days

License: ISC

Updated: Mar 12, 2026