akarinmoe

akarinmoe / SRaR

Public

Step-wise Rubric Rewards for LLM Reasoning

14
1
94% credibility
Found May 19, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SRaR is an academic research project implementing a reinforcement learning framework for improving LLM reasoning through step-level rubric-based reward signals, with published arXiv paper and HuggingFace integration.

Star Growth

See how this repo grew from 14 to 14 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is SRaR?

SRaR (Step-wise Rubric Rewards for LLM Reasoning) is a Python framework that improves how language models learn to reason. Standard reinforcement learning for LLMs only rewards the final answer, leaving intermediate reasoning steps unguided. SRaR fixes this by using an LLM judge to evaluate each reasoning step against a rubric, then feeding those step-level rewards back into training. It runs on top of verl, ByteDance's RL framework, and supports distributed training via Ray across multiple GPUs or NPUs.

Why is it gaining traction?

The research behind SRaR identifies a real problem: existing rubric-based methods aggregate scores into a single trajectory-level reward, which causes 18% of wrong steps to get positively rewarded and 49% of correct steps to get penalized. SRaR's three-part solution (step-attributed judging, per-step normalization, and decoupled advantage estimation) addresses this misalignment directly. The approach is backed by an arXiv paper and includes both SRaR and the simpler RaR baseline for comparison. Docker images are provided for quick setup across different CUDA and framework versions.

Who should use this?

Researchers working on LLM reasoning and reinforcement learning fine-tuning will find this most useful. If you're training models on math, coding, or multi-step reasoning tasks and want finer control over what the model learns (beyond just "right" or "wrong"), this provides the tooling. Teams already using verl for RL training can adopt SRaR as an additional recipe with minimal changes.

Verdict

SRaR tackles a legitimate gap in LLM reasoning training, and the academic backing adds credibility. However, with only 14 stars, this is early-stage research code rather than production-ready infrastructure. The documentation is functional but thin on examples beyond the core training scripts. If you're evaluating this for serious research, the arXiv paper is worth reading first; if it matches your needs, the code is solid enough to experiment with, but expect to write your own integration layer.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.