IIIIQIIII / EvasionBench

Public

A large-scale benchmark for detecting managerial evasion in earnings call Q&A.

iiiiqiiii.github.ioEvasionBench

100% credibility

Found Feb 07, 2026 at 27 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

HTML

AI Summary

EvasionBench is a benchmark dataset and tools for evaluating how well AI detects evasive responses from managers during earnings call question-and-answer sessions.

How It Works

🔍 Discover EvasionBench

You hear about a helpful tool that spots when company leaders dodge tough questions during earnings calls.

🌐 Visit the project page

You check out the main page to learn about the evasion types like direct answers, sidesteps, or full dodges.

🖥️ Try the quick demo

You open an easy online playground to test it right away without any setup.

💬 Pick a question and answer

You choose sample chats from real earnings calls to analyze.

▶️ Run the check

You hit go and let it review if the answer is straightforward or evasive.

📊 View the results

You see clear labels like 'direct', 'intermediate', or 'fully evasive' with how accurate it is.

✅ Spot evasion easily

Now you can quickly tell when answers are dodging the point in finance talks.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 27 to 33 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is EvasionBench?

EvasionBench is a large-scale benchmark for detecting managerial evasion in earnings call Q&A, with 84K training samples and a 1K gold-standard eval set. It defines a three-level taxonomy—direct, intermediate, fully evasive—and provides a fine-tuned Eva-4B model that crushes bigger LLMs like Claude and GPT on Macro-F1 scores. Developers get HuggingFace datasets and models, plus Python inference scripts and a Colab notebook to classify Q&A pairs instantly.

Why is it gaining traction?

It stands out with a Multi-Model Consensus framework using frontier LLMs for robust annotations, plus a public leaderboard showing Eva-4B topping closed-source giants. Unlike generic NLP benchmarks, this targets real-world finance evasion, delivering balanced data and CLI-ready inference for quick experiments. The arXiv paper and Apache 2.0 license make it easy to benchmark custom models.

Who should use this?

NLP engineers building fraud detection for corporate transcripts, quant researchers analyzing earnings calls, or finance devs fine-tuning LLMs on evasion tactics. Ideal for teams validating models against a gold-standard set without scraping data yourself.

Verdict

Grab it if you're in finance NLP—strong results and HF integration make it a solid large-scale benchmark starter. With 31 stars and 1.0% credibility score, it's early-stage with thin docs and no tests, so treat as research prototype, not production-ready.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 33 stars

Bonus: AI verified quality (100%)

Account age: 1,948 days

Repo age: 26 days

License: Apache-2.0

Updated: Feb 13, 2026