THU-KEG / LongTraceRL
PublicLongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
LongTraceRL is a research framework from Tsinghua University that improves how AI models reason about long documents. It uses reinforcement learning where the AI learns by solving complex multi-step questions. The key innovation is a 'rubric reward' system that scores the AI not just on whether it got the right answer, but on whether it found the right evidence along the way. This prevents the AI from 'gaming' the test and encourages genuine reasoning. The project provides trained models (4B to 30B parameters) that perform well on long-context benchmarks, along with the training code and data so others can reproduce or extend the research.
How It Works
You're a researcher working on AI that reads long documents. You find this project that teaches AI to reason better across 160,000 tokens of context.
You pull a ready-made container that has everything installed. Your GPUs are detected and everything talks to each other across multiple machines.
You point to your base AI model (like Qwen or DeepSeek) that you want to improve. The project knows how to work with these popular AI brains.
Your AI learns by solving tricky multi-hop questions. It gets scored not just on final answers, but on whether it found the right clues along the way - like a teacher grading the thinking process.
Run training on 8 GPUs in one computer
Coordinate across 4+ machines with 8 GPUs each
You see real-time progress as your AI gets better at finding information in long documents. Checkpoint saves let you go back to any version.
After training, you have an AI that excels at answering questions about long documents - like finding a specific fact in a 1000-page book.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.