Simplified-Reasoning

SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

78
5
89% credibility
Found May 19, 2026 at 80 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SU-01 is a compact artificial intelligence model trained to solve advanced mathematical and scientific olympiad problems at the level of gold medalists. The project includes the complete training pipeline (using supervised fine-tuning followed by two stages of reinforcement learning) and evaluation tools for measuring the model's performance on competition-style problems. At inference time, the model uses an internal generate-verify-revise loop that allows it to produce extremely long, coherent reasoning chains—sometimes exceeding 100,000 tokens—to tackle the most difficult problems. The model achieved 35 out of 35 possible points on both IMO 2025 and USAMO 2026 when using test-time scaling, matching gold-medal thresholds.

How It Works

1
🏆 Hear about a math-solving breakthrough

You discover that a compact AI model solved extremely difficult math competition problems at gold-medal level, matching the performance of top human competitors worldwide.

2
📚 Learn how it works

You read about the three-step training process that teaches the model to think step-by-step through proofs, verify its own work, and fix mistakes when needed.

3
🤖 Download the trained model

You grab the ready-to-use model from the public model library, so you don't have to wait days for training.

4
⚙️ Set up the inference server

You launch a simple server that hosts your model, connecting it to your computer so it can receive questions and send back answers.

5
📝 Ask the model a hard problem

You give the model a challenging olympiad problem, and it thinks deeply—sometimes generating over 100,000 words of reasoning to find the answer.

6
🔄 Watch it check and improve its own work

The model verifies its solution, identifies any gaps in logic, and revises its proof—repeating until it produces a complete, rigorous answer.

Get a complete, verified proof

You receive a detailed mathematical proof that has been checked for correctness, ready to be studied or submitted to a competition.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 80 to 78 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is SU-01?

SU-01 is a compact 30B-A3B reasoning model built for solving mathematical and scientific olympiad problems at a gold-medal level. It uses a three-stage training pipeline: reverse-perplexity curriculum SFT to install proof-oriented reasoning, followed by two-stage reinforcement learning that first optimizes for answer correctness and then shifts to proof quality. At inference time, it runs a generate-verify-revise loop that produces coherent reasoning chains exceeding 100K tokens. The project releases both the trained model weights and the full training code, including Docker-based setup and scripts for each training stage.

Why is it gaining traction?

The headline numbers are striking: 35 points on both IMO 2025 and USAMO 2026, passing the gold medal threshold. This level of competition math performance from a 30B parameter model is notable. The test-time scaling approach is particularly interesting -- instead of relying on external tools like code executors or theorem provers, the model uses its own internal verification-and-refinement loop to catch and fix errors. The evaluation pipeline supports both direct decoding and the more compute-intensive test-time scaling mode, with batch processing and SGLang server helpers for serving.

Who should use this?

Researchers working on mathematical reasoning or AI for science will find the training recipes and evaluation code valuable. The benchmark suite covers answer-verifiable tasks (AIME, FrontierScience) and proof-level problems (IMO-ProofBench), making it useful for comparing reasoning approaches. Teams wanting to fine-tune their own reasoning models can use the three-stage training scripts as a reference implementation. However, the low star count and recent release date mean this is still early-stage research code rather than production-ready tooling.

Verdict

This is legitimate research with impressive benchmark results, but the 0.899% credibility score and 78 stars reflect a brand-new, niche project. The documentation is thorough for a research release, but expect to invest time understanding the training pipeline before adapting it. Worth exploring if you're serious about olympiad-level reasoning, but treat it as a research framework, not a drop-in solution.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.