ServiceNow

ServiceNow / eva

Public

A New End-to-end Framework for Evaluating Voice Agents

47
1
100% credibility
Found Mar 24, 2026 at 47 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

EVA is an open-source framework that evaluates voice agents by simulating realistic multi-turn phone conversations between AI bots and scoring them on task accuracy and conversational experience.

How It Works

1
๐Ÿ” Discover EVA

You find EVA, a free tool from ServiceNow, while looking for ways to test voice assistants like phone helpers.

2
๐Ÿ“ฅ Get EVA ready

Download it to your computer and prepare a simple settings file with your voice service connections.

3
๐Ÿ”— Link voice services

Connect talking voices and smart thinkers so bots can chat like real phone calls.

4
โœˆ๏ธ Pick test scenarios

Choose everyday situations like rebooking flights to see how well the assistant handles them.

5
โ–ถ๏ธ Run the tests

Watch as two bots have natural conversations, one as customer, one as helper, across many scenarios.

๐Ÿ“Š See your scores

Get clear reports on how accurate and natural the assistant was, plus leaderboards to compare models.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 47 to 47 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is eva?

Eva is a Python end-to-end framework for evaluating voice agents through simulated phone calls between AI bots, scoring both task accuracy (EVA-A: completion, faithfulness) and conversational experience (EVA-X: turn-taking, conciseness). It runs full audio pipelines with real STT, LLMs, TTS from providers like OpenAI, ElevenLabs, and Deepgram, using 50 airline scenarios for reproducible benchmarks on cascade or speech-to-speech systems. Output includes audio recordings, transcripts, and CSV/JSON metrics via simple CLI commands like `eva --domain airline` or Docker.

Why is it gaining traction?

Unlike isolated component tests on arcangelo eva github or other end-to-end github projects, Eva captures compounded errors in live bot-to-bot interactions, exposing the accuracy-experience tradeoff in voice LLMs. Developers hook into its automated validators, leaderboards, and Hugging Face dataset for quick comparisons across 20+ models, with Docker Compose for one-command runs. The focus on multi-turn tool-calling in audio makes it a fresh alternative to text-only end-to-end llm project github benchmarks.

Who should use this?

Voice AI engineers benchmarking speech-to-speech models against end-to-end test frameworks like VoiceAgentBench. Customer service teams validating airline-style agents for policy adherence and natural dialogue. LLM researchers probing tool-calling fidelity in noisy audio pipelines, especially with end-to-end testing frameworks python setups.

Verdict

Grab Eva if you're building voice agentsโ€”its end-to-end framework delivers actionable metrics out of the box, backed by solid docs and tests. At 47 stars and 1.0% credibility, it's early but promising; fork and contribute to mature it for production eva 02-github workflows.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.