PaperGuru-AI / PaperGuru-Benchmark

Public

Lifecycle-Aware Memory for long-horizon LLM agents — 66.05% on PaperBench, 94.66% on SurveyBench, 10 peer-reviewed acceptances at FSE/ICML/TOSEM/AEI/ICoGB

100% credibility

Found May 09, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

TeX

AI Summary

A benchmark repository demonstrating PaperGuru's state-of-the-art performance on paper reproduction and literature survey tasks, including code submissions and generated reports.

How It Works

🔍 Discover PaperGuru

You stumble upon PaperGuru, a clever memory tool that helps AI assistants remember and recreate research papers better than experts.

📈 See Stunning Results

Amazed by charts showing it succeeds on 20 out of 23 tough paper recreations and writes rich surveys with figures and code.

🖥️ Recreate the Pictures

Run simple picture tools to build your own charts from the shared numbers, watching them match perfectly.

Explore Ready-Made Work

🔬

Paper Recreations

Open folders with working code trees that turn papers into runnable results.

📄

Survey Reports

Browse ready PDFs, web pages, and editable sources of expert literature reviews.

✅ Trust the Magic

Convinced by matching charts and impressive recreations, you see why PaperGuru is a game-changer for AI research.

Sign up to see the full architecture

3 more

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is PaperGuru-Benchmark?

PaperGuru-Benchmark collects reproducibility artifacts for PaperGuru, a lifecycle-aware memory system boosting long-horizon LLM agents on tasks like paper-to-code reproduction and literature surveys. Developers get runnable code submissions scoring 66.05% on PaperBench (beating human ML-PhD baselines across 23 papers) and generated surveys hitting 94.66% on SurveyBench, plus PDFs, LaTeX sources, aggregate scores, and TeX-based figure scripts. It solves the pain of reinventing memory for agents handling versioned docs, multi-hop citations, and provenance tracking.

Why is it gaining traction?

It stands out with peer-reviewed acceptances at FSE/ICML/TOSEM/AEI/ICoGB, proving PaperGuru's memory primitive works in production research pipelines—not just toy evals. Users notice bounded query costs on growing archives and provenance-grounded outputs, without per-task tweaks. The full benchmark kit lets devs verify SOTA claims hands-on, rare for LLM agent repos.

Who should use this?

LLM agent builders tackling long-horizon tasks like software engineering sessions or survey synthesis from citation graphs. Researchers validating memory systems against PaperBench or SurveyBench. ML PhDs reproducing papers via agent-generated code trees.

Verdict

Grab it if you're prototyping agent memory—reproducibility artifacts make it a solid benchmark reference despite low 1.0% credibility from 19 stars and early-stage docs. Maturity lags (focus on results over starter code), but peer-reviewed lifts make it worth forking for custom evals.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 19 stars

Penalty: New account (2d): -70%

Penalty: Very new repo (1d): -70%

Bonus: AI verified quality (100%)

Account age: 2 days

Repo age: 1 days

License: NOASSERTION

Updated: May 09, 2026