ServiceNow / eva

Public

A New End-to-end Framework for Evaluating Voice Agents

servicenow.github.ioeva

100% credibility

Found Mar 24, 2026 at 47 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

EVA is an open-source framework that evaluates voice agents by simulating realistic multi-turn phone conversations between AI bots and scoring them on task accuracy and conversational experience.

How It Works

🔍 Discover EVA

You find EVA, a free tool from ServiceNow, while looking for ways to test voice assistants like phone helpers.

📥 Get EVA ready

Download it to your computer and prepare a simple settings file with your voice service connections.

🔗 Link voice services

Connect talking voices and smart thinkers so bots can chat like real phone calls.

✈️ Pick test scenarios

Choose everyday situations like rebooking flights to see how well the assistant handles them.

▶️ Run the tests

Watch as two bots have natural conversations, one as customer, one as helper, across many scenarios.

📊 See your scores

Get clear reports on how accurate and natural the assistant was, plus leaderboards to compare models.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 47 to 47 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is eva?

Eva is a Python end-to-end framework for evaluating voice agents through simulated phone calls between AI bots, scoring both task accuracy (EVA-A: completion, faithfulness) and conversational experience (EVA-X: turn-taking, conciseness). It runs full audio pipelines with real STT, LLMs, TTS from providers like OpenAI, ElevenLabs, and Deepgram, using 50 airline scenarios for reproducible benchmarks on cascade or speech-to-speech systems. Output includes audio recordings, transcripts, and CSV/JSON metrics via simple CLI commands like `eva --domain airline` or Docker.

Why is it gaining traction?

Unlike isolated component tests on arcangelo eva github or other end-to-end github projects, Eva captures compounded errors in live bot-to-bot interactions, exposing the accuracy-experience tradeoff in voice LLMs. Developers hook into its automated validators, leaderboards, and Hugging Face dataset for quick comparisons across 20+ models, with Docker Compose for one-command runs. The focus on multi-turn tool-calling in audio makes it a fresh alternative to text-only end-to-end llm project github benchmarks.

Who should use this?

Voice AI engineers benchmarking speech-to-speech models against end-to-end test frameworks like VoiceAgentBench. Customer service teams validating airline-style agents for policy adherence and natural dialogue. LLM researchers probing tool-calling fidelity in noisy audio pipelines, especially with end-to-end testing frameworks python setups.

Verdict

Grab Eva if you're building voice agents—its end-to-end framework delivers actionable metrics out of the box, backed by solid docs and tests. At 47 stars and 1.0% credibility, it's early but promising; fork and contribute to mature it for production eva 02-github workflows.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

926

Followers

Base stars: 47 stars

Penalty: Very new repo (0d): -70%

Bonus: AI verified quality (100%)

Account age: 4,432 days

Repo age: 0 days

License: MIT

Updated: Mar 24, 2026