Siddharth-1001 / agent-eval-harness
PublicAn open-source evaluation framework specifically for agentic systems — not just LLM outputs, but full agent behavior.
An open-source evaluation tool that traces AI agent runs to measure tool success, latency, cost, and hallucinations across popular frameworks.
How It Works
You discover a free tool that makes it easy to test and track how well your smart AI assistants handle real tasks.
You download the tool and set it up on your computer super quickly, no hassle.
You simply wrap your AI assistant with the tracker so it starts recording every action automatically.
You give your AI some jobs to do, and it captures details like what tools it uses and how fast it goes.
You open a simple list of all your test runs, showing success rates, speeds, and costs at a glance.
View tables that highlight differences between tests to spot improvements.
Open a friendly screen with charts to analyze performance deeply.
You now clearly see strengths and weaknesses, so your AI gets smarter, faster, and cheaper to run.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.