zhongzhx / literature-harvest

Public

这是一个面向任意关键词的本地文献自动抓取与下载Skill，支持通过 PubMed、Europe PMC、Crossref 和 OpenAlex 批量检索研究文献，自动构建候选文献表，下载可合法访问的 PDF 或全文。

100% credibility

Found Apr 26, 2026 at 36 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

A portable toolkit that searches public scholarly databases for papers matching any keywords, downloads accessible full texts, and organizes them without duplicates.

How It Works

📚 Discover the literature gatherer

You find this handy tool that collects research papers on any topic you choose, like a personal librarian for science.

📥 Get the tool ready

Download the folder to your computer and open it up, everything you need is right there.

✏️ Pick your keywords

Open the simple settings file and type in the words describing your research interest, like 'marine fungi' or whatever excites you.

🚀 Start the harvest

Tell the tool to go find and save papers with one easy go command, and watch it search smart places for free full texts.

⏳ Wait and check progress

Give it time to gather lists of papers and download what it can, peeking at logs to see hundreds coming in.

🔄 Finish and tidy up

If it pauses, restart to grab the rest, then let it remove duplicates for a neat collection.

🎉 Enjoy your papers!

You now have tables of candidates, priority lists, and folders full of PDFs and full texts, ready for your reading.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 36 to 36 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is literature-harvest?

Literature-harvest is a Python tool that automates searching and downloading research papers for any keyword across PubMed, Europe PMC, Crossref, and OpenAlex. Edit a simple JSON config with queries, include/exclude terms, then run two PowerShell CLI commands: one starts a fresh harvest building candidate CSV tables and grabbing open PDFs or fulltexts; the second resumes downloads, chases PDFs from HTML pages, and deduplicates files. You get organized folders of metadata CSVs, logs, summaries, and a deduped PDF stash--perfect for quick literature harvests without scraping hassles.

Why is it gaining traction?

It stands out as a portable "skill" ready for GitHub skill directories or agent setups like skill GitHub Copilot or Claude, bundling everything so you clone and run without extra deps. Bilingual English/Chinese docs cover workflows, common pitfalls like HTML-only fulltexts, and resumption for big runs, plus smart dedup by DOI/title/hash prioritizing PDFs. No cloud lock-in, just local Python 3.13 on Windows for reliable, legal open access pulls.

Who should use this?

Academic researchers doing lit reviews on niche topics like marine fungi or genome mining. Devs building personal knowledge bases or feeding RAG systems with fresh papers. Bioinformaticians needing bulk open PDFs without manual Zotero hunts.

Verdict

Grab it for lightweight literature harvesting--solid docs and MIT license make it shareable, but 36 stars and 1.0% credibility signal early maturity; test on small queries first. Worth starring if you harvest literature regularly.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 36 stars

Penalty: New account (9d): -70%

Penalty: Very new repo (2d): -70%

Penalty: New account with popular repo: -90%

Bonus: AI verified quality (100%)

Account age: 9 days

Repo age: 3 days

License: MIT

Updated: Apr 26, 2026