zhengkid

zhengkid / AutoTTS

Public

The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"

12
0
100% credibility
Found May 10, 2026 at 12 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

AutoTTS is a research tool for AI agents to automatically discover and refine efficient reasoning strategies for language models on math benchmarks using offline replay simulations.

How It Works

1
πŸ” Discover AutoTTS

You find this clever project online that shows how AI can teach itself smarter ways to solve tough math problems.

2
πŸ“– Dive into the guide

The friendly readme explains how AI agents play in a safe sandbox to invent better thinking tricks without any extra training.

3
πŸ› οΈ Set up your playground

Follow easy steps to prepare your computer so you can test and explore AI strategies right away.

4
πŸ§ͺ Test ready examples

Run simple checks on math puzzles to compare how different thinking paths save time and stay accurate.

5
πŸš€ Spark AI discovery

Launch the magic where AI helpers suggest, tweak, and perfect new reasoning controllers automatically.

6
πŸ“Š Review the wins

Check colorful charts revealing up to 70% less thinking effort with matching smartness on new puzzles.

πŸŽ‰ Master efficient AI!

Celebrate your super-smart AI thinker, discovered hands-free, ready to tackle hard problems faster.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 12 to 12 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is AutoTTS?

AutoTTS is the official Python repo for agentic discovery of test-time scaling strategies, where LLMs improve LLMs by automatically searching code-defined controllers in an offline replay environment. It reframes adaptive inference from handcrafted heuristics to environment-driven evolution: build replay caches once from math benchmarks like AIME/HMMT, then evaluate controllers cheaply with zero LLM calls. Users get replay evals, baselines, and a workflow to run discovery for token-saving policies matching full-budget accuracy.

Why is it gaining traction?

Delivers 69.5% token savings vs baselines at beta=0.5, with held-out generalization across Qwen3 scalesβ€”all for $40 and 160 minutes per run. The hook: agentic autotts search via Claude/Codex proposes refinements using traces/scaling curves, beating SC@64/Parallel-Probe without gradients. Devs love replay-only eval for quick baselines and extending to custom backbones.

Who should use this?

LLM inference engineers tuning test-time compute for math/reasoning QA. Suited for researchers replicating scaling curves on Qwen replay data or running agentic discovery to beat handcrafted controllers.

Verdict

Solid repro tools in this low-maturity autotts Python repo (12 stars, 1.0% credibility score)β€”run evals out-of-box, but expect paper-reading for full discovery. Worth forking if agentic LLM scaling fits your stack.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.