ljy2222 / Curriculum-RLAIF
PublicCurriculum-RLAIF is a data-centric curriculum learning framework for reward model training in RLAIF-based LLM alignment
A research tool that creates preference data at controlled difficulty levels and trains AI reward models in an easy-to-hard sequence to better align large language models.
How It Works
You hear about a clever way to train AI helpers to make better choices by practicing with examples from easy to hard.
You gather simple conversation examples and pick friendly AI thinkers to help create training data.
With one easy launch, everything kicks off: creating pairs, labeling them, and building lessons.
First come obvious good-vs-bad examples, followed by medium challenges mixing random and guided ones.
A wise AI reviews random pairs to decide which response is truly better, reducing confusion.
Examples get sorted into stages from simplest to hardest, like school lessons.
The AI judge learns each stage, building on what it mastered before to get really sharp.
Your new reward model beats others at spotting harmless, helpful responses—success!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.