ZJU-OmniAI

ZJU-OmniAI / GFT

Public

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

19
0
100% credibility
Found Apr 21, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

GFT is an open-source framework that trains large language models to excel at math reasoning by combining imitation learning with reinforcement techniques in a single efficient stage.

How It Works

1
🔍 Discover GFT

You find this project on GitHub while looking for ways to make AI better at solving math problems, and get excited about its simple training method.

2
📖 Read the guide

You explore the paper and instructions, learning how it smartly mixes copying good answers with exploring new ideas to teach AI math reasoning.

3
⚙️ Set up your workspace

You install the easy training tools on your powerful computer setup, following the quick steps.

4
📥 Gather math problems

You download ready-made collections of math questions and answers to use for training.

5
🚀 Start training

You launch the training with a click, watching your AI learn from groups of answers and get smarter at math.

🏆 Celebrate smarter AI

Your AI now solves tougher math puzzles faster and more accurately, ready for real use!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is GFT?

GFT is a Python framework for group fine-tuning large language models, shifting from pure imitation learning to reward fine-tuning via unbiased group advantages and dynamic coefficient rectification. It fixes supervised fine-tuning flaws like single-path dependency causing entropy collapse and unstable gradients leading to forgetting, delivering a stable base for downstream RL like GRPO or PPO. Developers get ready-to-run scripts for math reasoning on datasets like NuminaMath-CoT, achieving strong results with just 10k samples.

Why is it gaining traction?

Unlike standard SFT pipelines that hit ceilings on reasoning benchmarks, GFT boosts data efficiency – beating 100k-sample SFT with 10k while preserving policy entropy for better RL synergy. Its advantages shine in dynamic fine-tuning setups, where rectification stabilizes training without losing knowledge injection. Built on scalable tools like verl, it offers plug-and-play recipes for Python-based LLM post-training.

Who should use this?

ML engineers tuning LLMs for math or multi-step reasoning tasks, especially those frustrated with SFT's memorization pitfalls. AI researchers prototyping reward models or GRPO/PPO flows on Qwen models. Teams handling group-based imitation-to-reward transitions in production fine-tuning.

Verdict

Worth trying for efficient reward fine-tuning in Python, particularly if you're chasing unbiased advantages and dynamic rectification on reasoning data. With 19 stars and 1.0% credibility score, it's early-stage – solid paper (ACL 2026 Findings) and scripts, but verify maturity via small runs before scaling.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.