agentevals-dev

Collection of evaluators for agentevals

47
0
100% credibility
Found Mar 24, 2026 at 47 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A community collection of standalone scoring tools designed to evaluate the performance of AI agents by analyzing their responses and actions.

How It Works

1
🔍 Discover helpful checkers

You find a collection of simple tools to score how well your AI assistant performs on tasks.

2
📋 Browse the options

Look through ready-made scorers that check things like if answers contain key words, match patterns, or use helper tools properly.

3
Pick and add a checker

Choose the perfect scorer for your needs and easily connect it to your AI tests with a quick setup.

4
▶️ Run your tests

Launch your AI assistant's tasks, and the checker automatically reviews each response.

5
📊 Review the scores

Get clear numbers from 0 to 1 showing strengths and areas to improve, with helpful notes on issues.

Improve your AI

Use the insights to make your AI assistant smarter and more reliable on real tasks.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 47 to 47 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is evaluators?

This GitHub repo curates a collection of 9 ready-made evaluators for the agentevals framework, a Python-based tool for scoring AI agent traces built on Google ADK. Developers reference them in YAML configs as remote GitHub sources—CLI command `agentevals evaluator list --source github` browses options, and they auto-download for evals on response quality, tool usage, JSON validity, string matches, regex patterns, and more. It solves the hassle of hand-coding basic agent metrics, letting you run evals like `agentevals run traces/my_trace.json` with zero local setup.

Why is it gaining traction?

Unlike scattered custom scripts or heavy LLM-judged evals, these are lightweight, stdin/stdout Python programs with auto-validation, smoke tests, and an index.yaml for discovery—plug in via config and get per-invocation scores instantly. Community contributions are streamlined with scaffolding CLI and PR checks, making it a growing hub for agent metrics like tool coverage or Levenshtein similarity. The remote fetching hooks devs tired of local dependencies.

Who should use this?

Agent builders on agentevals evaluating tool-calling trajectories, response sanity (length, no echoes), or structured outputs. Ideal for teams benchmarking LLM agents against ground-truth strings, regex, or sequences, especially in Google ADK workflows needing quick, reproducible scores.

Verdict

Grab it if you're on agentevals—solid docs, CLI integration, and validation make the 47 stars and 1.0% credibility score punch above early-stage weight. Skip for non-agentevals setups until more contributors fill gaps.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.