Hermes Bench is a self-hosted web application for benchmarking local large language models and AI agents with customizable tasks, automated judging, and result comparisons.
How It Works
You find a tool to easily test how well your local AI models handle real tasks like coding, searching, and reasoning.
Run a simple starter script and open your web browser to see the friendly dashboard ready to go.
The app automatically finds your AI models, checks your computer's power, and lets you start test servers if needed.
Choose which AI brains to test, select a ready-made challenge set, and hit start to watch them tackle tasks live.
See side-by-side scores, pass/fail checks, tool usage logs, and smart judging that tells you who's best.
Review detailed reports, export findings, and tweak setups to make your local AIs smarter over time.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.