catello09 / agent-eval-ts
PublicAgent evaluation & benchmarking for TypeScript: test suites, LLM metrics, caching, OpenAI-compatible judge, JUnit/HTML/MD reports, Docker, GitHub Actions.
agent-eval-ts is a TypeScript framework for defining test suites to evaluate AI agents on metrics like accuracy, latency, cost, and tool usage, with reporting, caching, and model comparison features.
How It Works
You find a handy tool that lets you check how well your AI assistant handles everyday tasks.
Download it to your computer and get everything ready in a few minutes.
Write down simple questions and right answers to see what your AI should do.
Choose a pretend AI for quick tests or connect your real one if you want true results.
Press go and let it run all your checks automatically, watching the progress.
Open the colorful report showing pass rates, speeds, and smart insights.
Celebrate knowing exactly how good your assistant is and where to improve it.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.