zhengkid / AutoTTS

Public

The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"

100% credibility

Found May 10, 2026 at 12 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

AutoTTS is a research tool for AI agents to automatically discover and refine efficient reasoning strategies for language models on math benchmarks using offline replay simulations.

How It Works

🔍 Discover AutoTTS

You find this clever project online that shows how AI can teach itself smarter ways to solve tough math problems.

📖 Dive into the guide

The friendly readme explains how AI agents play in a safe sandbox to invent better thinking tricks without any extra training.

🛠️ Set up your playground

Follow easy steps to prepare your computer so you can test and explore AI strategies right away.

🧪 Test ready examples

Run simple checks on math puzzles to compare how different thinking paths save time and stay accurate.

🚀 Spark AI discovery

Launch the magic where AI helpers suggest, tweak, and perfect new reasoning controllers automatically.

📊 Review the wins

Check colorful charts revealing up to 70% less thinking effort with matching smartness on new puzzles.

🎉 Master efficient AI!

Celebrate your super-smart AI thinker, discovered hands-free, ready to tackle hard problems faster.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 12 to 12 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is AutoTTS?

AutoTTS is the official Python repo for agentic discovery of test-time scaling strategies, where LLMs improve LLMs by automatically searching code-defined controllers in an offline replay environment. It reframes adaptive inference from handcrafted heuristics to environment-driven evolution: build replay caches once from math benchmarks like AIME/HMMT, then evaluate controllers cheaply with zero LLM calls. Users get replay evals, baselines, and a workflow to run discovery for token-saving policies matching full-budget accuracy.

Why is it gaining traction?

Delivers 69.5% token savings vs baselines at beta=0.5, with held-out generalization across Qwen3 scales—all for $40 and 160 minutes per run. The hook: agentic autotts search via Claude/Codex proposes refinements using traces/scaling curves, beating SC@64/Parallel-Probe without gradients. Devs love replay-only eval for quick baselines and extending to custom backbones.

Who should use this?

LLM inference engineers tuning test-time compute for math/reasoning QA. Suited for researchers replicating scaling curves on Qwen replay data or running agentic discovery to beat handcrafted controllers.

Verdict

Solid repro tools in this low-maturity autotts Python repo (12 stars, 1.0% credibility score)—run evals out-of-box, but expect paper-reading for full discovery. Worth forking if agentic LLM scaling fits your stack.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 12 stars

Penalty: Very new repo (0d): -70%

Bonus: AI verified quality (100%)

Account age: 2,492 days

Repo age: 0 days

Updated: May 10, 2026