Nebularaid2000 / rethink_sft_generalization
PublicRepo for paper "Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability"
AlpacaEval is a fast, low-cost automatic benchmark for ranking instruction-following AI models by comparing their outputs to a reference using LLM judges that agree highly with humans.
How It Works
You hear about a simple way to compare AI chatbots on how well they follow everyday instructions, like a fair race for smart assistants.
Download and set it up on your computer with a quick install, no complicated steps needed.
Collect responses from your AI model to simple questions, just like saving notes from a conversation.
Click to compare your AI's answers against a strong baseline, watching it judge which one follows instructions better.
Get a clear leaderboard showing win rates, like a scorecard revealing your AI's strengths.
Know exactly which AI excels at helpful responses, ready to use the best one confidently.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.