yhavinga

Experiment with coding agent to train a decoding transformer to perform addition

19
2
100% credibility
Found Feb 24, 2026 at 15 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This project shares code, results, and reports from experiments training the smallest transformer models to accurately add 10-digit numbers.

How It Works

1
🔍 Discover Tiny AI Addition

You stumble upon this fun challenge to build the world's smallest smart assistant that adds huge 10-digit numbers perfectly.

2
📥 Grab the Ready Code

Download the simple files that let anyone recreate these amazing tiny math experiments.

3
⚙️ Choose Model Recipes

Pick from easy lists of super-small assistant designs to see which ones learn addition best.

4
🚀 Launch Learning Sessions

Hit go, and watch your tiny assistants train on powerful cloud helpers to master big-number math.

5
📈 Track the Magic Moment

Follow live charts as accuracy skyrockets from guessing to perfect – that's the 'aha' grokking thrill!

🏆 Celebrate Your Math Champ

Enjoy your tiniest-ever assistant – just 777 parts – crushing 10-digit additions with 99.7% wins!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 15 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is gpt-acc-jax?

This repo runs a predictive coding experiment training decoder-only transformers to perform exact 10-digit integer addition with cross-entropy loss and autoregressive decoding. Using Python, JAX, and Flax on TPU spot instances, it delivers a 777-parameter model hitting 99.69% accuracy via curriculum learning across 47 hyperparameter sweeps. Developers get ready-to-run scripts for local training, full TPU orchestration, Weights & Biases tracking, and LaTeX reports with grokking curves.

Why is it gaining traction?

Unlike generic transformer trainers, it uncovers a sharp parameter cliff around 800 params where accuracy jumps from 0% to 99%, plus counterintuitive wins like single-layer over multi-layer models at equal scale. The TPU spot manager handles preemptions and parallel sweeps for ~$20 total compute, making github experiment tracking cheap and scalable. It's a chatgpt microcap experiment github vibe—autonomous agent-inspired addition (acc) benchmarks that hook JAX users chasing grokking phenomena.

Who should use this?

JAX/TPU engineers scaling tiny model experiments, ML researchers probing grokking on toy tasks like addition or dual coding experiments, and hobbyists replicating baddeley coding experiment results without cloud hassle. Ideal for design of experiment github workflows needing sweeps, validation curves, and failure analysis.

Verdict

Impressive proof-of-concept for minimal transformers—run the sweeps yourself for insights into decoding efficiency. With only 10 stars and 1.0% credibility score, it's early-stage (solid README, no tests); fork for production tweaks but trust the results for research baselines.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.