ProgramBench is a benchmark for testing if AI agents can recreate open-source programs' functionality from only their compiled binaries and documentation.
How It Works
You find this benchmark on GitHub or its website, curious if AI can rebuild real programs just from their ready-to-run files and help docs.
You quickly set it up on your computer with a simple install, so everything is prepared to test AI creations.
You download bundles of real-world programs, each with hidden source code but full instructions on what they do.
You feed in your AI agent's rebuilt code and hit go β it checks if it matches the original program's behavior perfectly.
You get clear reports on pass/fail rates, warnings, and details for each test, showing exactly how well it did.
Your results help rank AI tools worldwide, advancing smarter software-building assistants for everyone.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.