XIAO4579

XIAO4579 / PRISM

Public

Beyond SFT-to-RL: Pre-alignment via Black-BoxOn-Policy Distillation for Multimodal RL

45
1
100% credibility
Found May 06, 2026 at 45 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

PRISM is a research pipeline that enhances vision-language AI models through supervised fine-tuning, adversarial alignment using a Mixture-of-Experts discriminator, and reward-based reinforcement learning.

How It Works

1
📖 Discover PRISM

You find this exciting research project that helps make AI models better at understanding images and text together.

2
🛠️ Get ready

Set up a simple workspace on your computer by installing a few helper tools.

3
📥 Grab starting pieces

Download ready-to-use example conversations and a base AI model from a trusted sharing site.

4
🎓 Teach the basics

Run a quick lesson to help the AI learn from good examples, or skip ahead with our pre-trained version.

5
🎯 Supercharge alignment

Use a smart checker that spots good and bad responses to gently guide the AI back on track.

6
🚀 Power up with rewards

Give the AI feedback on right answers and neat formatting to make it shine even brighter.

7
📊 Test on challenges

Check how well it does on tough math, science, and vision puzzles using built-in testers.

🏆 Your AI excels!

Celebrate as your improved assistant crushes multimodal tasks with top scores.

Sign up to see the full architecture

6 more

Sign Up Free

Star Growth

See how this repo grew from 45 to 45 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is PRISM?

PRISM inserts a pre-alignment stage between supervised fine-tuning (SFT) and reinforcement learning (RL) for multimodal models like Qwen3-VL, using black-box on-policy distillation against a Mixture-of-Experts discriminator to fix policy drift in vision-reasoning tasks. Developers get ready-to-run Python scripts for the full three-stage pipeline—SFT via LLaMA-Factory integration, PRISM alignment, and RLVR with GRPO/DAPO/GSPO—plus 113K curated Gemini demonstrations, 1.26M public data, released checkpoints, and lmms-eval benchmarks like MathVista, MMMU-Pro, and HallusionBench.

Why is it gaining traction?

It outperforms direct SFT-to-RL baselines across multimodal math/reasoning evals by disentangling perception and reasoning signals without teacher logits, making RL more stable. Full reproduction on 8x H100s with verl framework, DeepSpeed, and multi-node Ray support lowers the barrier beyond typical github prism library or prisma experiments—plug in your Qwen3-VL checkpoint and go.

Who should use this?

ML engineers fine-tuning vision-language models for math-heavy apps (e.g., diagram solvers, OCR reasoning) or researchers benchmarking RLVR on multimodal data. Ideal if you're past basic SFT but hitting drift in prismatic setups like prism github openai flows, skipping minecraft prism github distractions for production RL pipelines.

Verdict

Promising for multimodal RL but early: 45 stars and 1.0% credibility signal unproven scale. Strong docs and released assets make it worth forking beyond compare github desktop tinkering—test on your cluster if RL stability matters.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.