xzxxntxdy

xzxxntxdy / PEPO

Public

Official repo for ”Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought“

19
0
100% credibility
Found Mar 31, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This repository implements PEPO, a token-level reinforcement learning method to enhance multimodal chain-of-thought reasoning in vision-language models, with training and evaluation scripts for geometry and math datasets.

How It Works

1
🔍 Discover PEPO

You hear about this smart way to teach AI to solve picture-based math puzzles better, from a research paper.

2
🛠️ Set up your playground

You create a fresh space on your computer to experiment with AI learning.

3
📥 Collect puzzle books and pictures

You download geometry problems with diagrams to use as teaching examples.

4
🧠 Choose a starting AI brain

You pick a vision-savvy AI like Qwen to begin improving on math reasoning.

5
🚀 Launch the learning session

With one command, you start training your AI to think step-by-step on visual puzzles.

6
📈 Monitor progress

You watch as your AI gets better at explaining and solving geometry challenges.

7
🧪 Test on new puzzles

You run checks on math and logic datasets to see the smarter results.

🎉 AI masters visual reasoning

Your trained AI now excels at multimodal chain-of-thought puzzles, ready for more!

Sign up to see the full architecture

6 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is PEPO?

PEPO implements token-level reinforcement learning for multimodal chain-of-thought reasoning in vision-language models, weighting tokens by visual grounding and uncertainty to outperform sequence-level RL. Developers get plug-and-play scripts to train on Geometry3K using Python with ms-swift and vLLM, plus evaluation on MathVista, MathVerse, and LogicVista benchmarks. It's the official GitHub repository tied to a recent arXiv paper, handling data prep and inference out of the box.

Why is it gaining traction?

It slots into GRPO or DAPO pipelines with minimal overhead and no extra labels, delivering better geometry/math scores on small VLMs like Qwen2.5-VL-3B. Bash scripts for training and eval make replication fast, while vLLM speeds up batch generation for avg@N scoring. Early adopters value the perception-exploration fusion that prioritizes key reasoning steps.

Who should use this?

ML researchers tuning VLMs for visual math or geometry tasks, like improving CoT on diagrams. Teams experimenting with RLHF on 3B models without massive compute. Folks replicating paper results via official GitHub releases and CLI-driven workflows.

Verdict

Worth forking for multimodal RL experiments—docs and recipes are crisp despite 19 stars and 1.0% credibility score signaling early maturity. Test on your data before production; lacks broad benchmarks or pre-trained weights yet.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.