PrimeIntellect-ai / experiments-autonomous-speedrunning

Public

autonomous nanogpt optimizer speedrun

100% credibility

Found May 15, 2026 at 58 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

This repository contains a raw archive of autonomous AI agents (Claude Code/Opus 4.7 and Codex/GPT 5.5) competing on the track_3_optimization benchmark from modded-nanogpt to reach validation loss 3.28 in as few training steps as possible, including harness files, plans, threads, run logs, and generated variants organized by experiment waves.

Star Growth

See how this repo grew from 58 to 47 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is experiments-autonomous-speedrunning?

This is a research project where autonomous coding agents compete to optimize neural network training as fast as possible. Think of it as a "speedrun" competition, but instead of video games, the agents are writing and iterating on training code for NanoGPT. The project explores a wide range of optimizer modifications—variants of Muon with different momentum schedules, adaptive learning rate mechanisms, and spectral normalization techniques—treating each experimental run as a data point in a massive hyperparameter search. Python serves as the implementation language, with training runs recorded as structured exports containing metrics, source snapshots, and console logs for later analysis.

Why is it gaining traction?

The project taps into the growing interest in autonomous systems for research and development. Rather than manually tuning optimizers, researchers can observe how autonomous agents explore the optimization landscape—often finding unexpected combinations that outperform hand-tuned baselines. The speedrun framing makes the experimental process transparent and reproducible, since every run includes exact source code and training logs. For developers interested in optimizer research or LLM training efficiency, this serves as a curated dataset of "what not to try" and occasionally, surprising wins.

Who should use this?

ML researchers studying optimizer behavior will find the most value here, particularly those interested in second-order methods like Muon and Polar Express. Neural architecture researchers looking for benchmark comparisons against the NanoGPT baseline can reference the exported training curves. If you just need a working GPT implementation, look elsewhere—this is a tool for pushing optimization boundaries, not for building products.

Verdict

With a 1.0% credibility score and only 47 stars, this is early-stage research infrastructure rather than a polished library. The extensive run data is valuable for analysis, but documentation is sparse and the barrier to contribution is high. Treat it as an interesting experiment in autonomous research, not a tool you can drop into production today.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

820

Followers

Base stars: 47 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 943 days

Repo age: 3 days

Updated: May 15, 2026