GasolSun36

GasolSun36 / PyRAG

Public

Retrieval is CheapShow Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

18
1
69% credibility
Found May 17, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

PyRAG is a research framework that transforms complex question-answering into a step-by-step code execution process. Instead of having an AI answer questions directly, it breaks complex questions into simpler parts, generates Python code to search Wikipedia and answer each part, executes that code automatically, and combines the results. The system can fix its own errors and retry failed steps. There are two versions: a training-free version that works out of the box, and a reinforcement-learning-trained version that requires more setup but performs better. This is legitimate academic research with an associated arXiv paper, though users should be aware it executes AI-generated code.

How It Works

1
💬 You ask a complex question

You type a multi-part question like 'Who is older, Jed Hoyer or John William Henry II?' that requires reasoning across multiple facts.

2
🔍 Your question gets broken into pieces

The system automatically splits your complex question into simpler sub-questions that can each be answered with a search.

3
💻 Python code writes itself to solve your problem

The system writes executable Python code that will search for information and combine the results to answer your original question.

4
▶️ The code runs step by step

Each line of the generated code executes in order, searching Wikipedia for relevant facts and extracting answers from the retrieved documents.

5
🔧 Mistakes get fixed automatically

If the code hits an error or returns incomplete information, the system rewrites and retries that part automatically.

You receive your answer

The system combines all the retrieved facts and reasoning steps to give you a clear, accurate answer to your original question.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is PyRAG?

PyRAG is a Python framework that transforms complex multi-hop questions into executable Python programs instead of relying on free-form reasoning. When you ask a question like "Who was born earlier, the director of Inception or Jurassic Park?", it breaks this into sub-queries, generates a Python script using two primitives—retrieve() and answer()—and executes it step-by-step in a Python interpreter. The system includes self-repair: if the generated code crashes, the error feeds back to the planner for automatic correction. It also adapts retrieval dynamically, re-executing steps with boosted top-k when answers come back insufficient.

Why is it gaining traction?

The hook is transparency. Unlike black-box RAG pipelines where reasoning is hidden in model weights, PyRAG produces inspectable traces: you see the generated code, each retrieval call, and every intermediate answer. The compiler-grounded self-repair mechanism replaces unreliable LLM self-reflection with deterministic runtime errors. There's also a RL-trained variant (PyRAG-RL) using GRPO fine-tuning through VERL for users who want to specialize the agents further.

Who should use this?

Researchers working on multi-hop QA benchmarks (2WikiMQA, MuSiQue, Bamboogle) who need interpretable reasoning traces. Teams evaluating retrieval-augmented generation for complex queries where vanilla RAG underperforms. ML engineers exploring agentic retrieval patterns who want to inspect every step of the reasoning process.

Verdict

PyRAG is a novel approach with a solid paper backing it, but the 18-star count and 0.699% credibility score signal early-stage work. The architecture is sound, the README is thorough, and the training-free baseline runs without any RL infrastructure. However, production readiness requires significant setup—running vLLM servers, a dense retrieval endpoint, and the VERL pipeline for RL training. Start with the training-free variant to evaluate the approach, but budget engineering time for the infrastructure dependencies before committing.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.