collinear-ai

Your Company Bench: Long-horizon coherence benchmark in simulated time to test AI agent abilities to manage resources and maximize returns as a tech startup founder

11
0
100% credibility
Found Feb 28, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

YC-Bench is a simulation where AI agents run a virtual AI startup for 1-3 years, managing employees, tasks, prestige, and cash flow to test long-term decision-making.

How It Works

1
🔍 Discover YC-Bench

You stumble upon this fun benchmark that lets AI play CEO of a startup, testing if they can grow a company without going broke.

2
📥 Get it running

Download and launch it with one simple click - no tech skills needed, everything sets up automatically.

3
Pick your AI CEO
🧠
Fast AI

Pick a speedy AI for quick decisions.

💭
Thoughtful AI

Select a careful AI that plans ahead.

4
⚙️ Choose difficulty

Select easy mode to learn or challenge mode for a real test of endurance.

5
▶️ Hit play

Watch the live dashboard as your AI CEO hires team, picks projects, chases deadlines, and fights to stay in business.

6
📊 Track the adventure

See cash flow sparkline, team skills, task progress bars update in real-time - feel the tension as runway shrinks.

🏆 Get your results

Celebrate survival with millions in the bank, prestige charts, and comparison plots - share how your AI stacked up!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is yc-bench?

YC-bench is a Python-based benchmark for testing AI agents' long-horizon coherence by simulating them as CEOs of a tech startup over 1-3 simulated years. Agents manage resources like cash flow, employee skills, prestige across seven domains, and market tasks via a CLI tool against a deterministic SQLite-backed discrete-event simulation. Users get quick evals with presets from tutorial to nightmare, live terminal dashboards tracking funds and runway, and JSON outputs for plotting multi-model company benchmarking results.

Why is it gaining traction?

Unlike short-task benches, yc-bench stresses compounding decisions—prestige specialization, deadline risks, payroll creep—forcing agents to sustain strategy over hundreds of turns, revealing real coherence gaps. Setup is dead simple with uv sync, LiteLLM for any provider, and scripts for parallel runs plus funds curves and prestige radars. Devs dig the CLI commands like market browse, task accept/assign/dispatch, and scratchpad for persistent notes surviving context truncation.

Who should use this?

AI researchers and eval teams benchmarking LLM agents on long-term planning, especially for company bench strength in resource allocation and risk management. Startup founders testing agent copilots for ops sims, or devs building github company repos around autonomous CEOs. Ideal for company benchmarking examples where you need reproducible seeds and configs like challenge.toml for 3-year endurance tests.

Verdict

Grab it for agent evals—docs shine with setup, CLI refs, and plots, despite 10 stars and 1.0% credibility signaling early maturity. Run a fast_test on your model today; low risk, high insight into whether it survives as CEO.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.