PaperGuru-AI

Lifecycle-Aware Memory for long-horizon LLM agents — 66.05% on PaperBench, 94.66% on SurveyBench, 10 peer-reviewed acceptances at FSE/ICML/TOSEM/AEI/ICoGB

19
0
100% credibility
Found May 09, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
TeX
AI Summary

A benchmark repository demonstrating PaperGuru's state-of-the-art performance on paper reproduction and literature survey tasks, including code submissions and generated reports.

How It Works

1
🔍 Discover PaperGuru

You stumble upon PaperGuru, a clever memory tool that helps AI assistants remember and recreate research papers better than experts.

2
📈 See Stunning Results

Amazed by charts showing it succeeds on 20 out of 23 tough paper recreations and writes rich surveys with figures and code.

3
🖥️ Recreate the Pictures

Run simple picture tools to build your own charts from the shared numbers, watching them match perfectly.

4
Explore Ready-Made Work
🔬
Paper Recreations

Open folders with working code trees that turn papers into runnable results.

📄
Survey Reports

Browse ready PDFs, web pages, and editable sources of expert literature reviews.

Trust the Magic

Convinced by matching charts and impressive recreations, you see why PaperGuru is a game-changer for AI research.

Sign up to see the full architecture

3 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is PaperGuru-Benchmark?

PaperGuru-Benchmark collects reproducibility artifacts for PaperGuru, a lifecycle-aware memory system boosting long-horizon LLM agents on tasks like paper-to-code reproduction and literature surveys. Developers get runnable code submissions scoring 66.05% on PaperBench (beating human ML-PhD baselines across 23 papers) and generated surveys hitting 94.66% on SurveyBench, plus PDFs, LaTeX sources, aggregate scores, and TeX-based figure scripts. It solves the pain of reinventing memory for agents handling versioned docs, multi-hop citations, and provenance tracking.

Why is it gaining traction?

It stands out with peer-reviewed acceptances at FSE/ICML/TOSEM/AEI/ICoGB, proving PaperGuru's memory primitive works in production research pipelines—not just toy evals. Users notice bounded query costs on growing archives and provenance-grounded outputs, without per-task tweaks. The full benchmark kit lets devs verify SOTA claims hands-on, rare for LLM agent repos.

Who should use this?

LLM agent builders tackling long-horizon tasks like software engineering sessions or survey synthesis from citation graphs. Researchers validating memory systems against PaperBench or SurveyBench. ML PhDs reproducing papers via agent-generated code trees.

Verdict

Grab it if you're prototyping agent memory—reproducibility artifacts make it a solid benchmark reference despite low 1.0% credibility from 19 stars and early-stage docs. Maturity lags (focus on results over starter code), but peer-reviewed lifts make it worth forking for custom evals.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.