youdotcom-oss / web-search-api-evals

Public

An Evaluation Framework for Web Search APIs

100% credibility

Found Apr 23, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

An open-source Python framework for benchmarking and comparing the accuracy and speed of various AI web search APIs using standard datasets.

How It Works

🔍 Discover the evaluation tool

You hear about this handy tool from You.com that lets you compare different web search services side by side.

💻 Get it ready on your computer

Download the tool and set it up quickly so it's all prepared for testing.

🌐 Connect search services

Link up the search providers you want to test, like You.com or others, using simple access passes.

📋 Choose your tests

Pick from ready-made question sets that challenge the searches on facts, speed, and deep research.

▶️ Run the comparison

Hit start and let it quietly test each service on the questions you chose.

📊 Review the results

Check out clear tables showing which service answers best and fastest.

🎉 Pick the winner

Now you know exactly which search tool shines for your needs and can use it confidently.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is web-search-api-evals?

This Python framework lets you benchmark web search APIs like You.com, Exa, Tavily, Parallel, and Google SERP against standard datasets such as SimpleQA, FRAMES, DeepSearchQA, and BrowseComp. It fetches results, synthesizes answers via LLM (GPT-4o nano by default), and grades accuracy plus latency using another LLM judge (GPT-4o mini or Gemini). Developers get CSV outputs with per-query details and aggregated metrics, solving the pain of manually comparing AI search providers for RAG or agentic apps.

Why is it gaining traction?

Unlike generic eval suites like langchain evaluation github or ragas rag evaluation github, it focuses on web search APIs with fair, standardized synthesis and judging—same prompts across providers. CLI flags for samplers, datasets, limits, concurrency, and cleaning make quick runs painless (e.g., `--samplers you_search --datasets simpleqa --limit 100`), and README tables showcase real results. It's a plug-and-play evaluation framework template for AI systems in the wild, including research endpoints.

Who should use this?

AI engineers tuning RAG pipelines or evaluation framework for LLMs needing reliable web retrieval. Search product leads comparing APIs for latency-accuracy tradeoffs. Researchers prototyping agentic workflows, akin to helm evaluation github or bedrock evaluation github setups.

Verdict

Solid starter for web search API evals—great docs, MIT license, pre-commit hooks—but 19 stars and 1.0% credibility score signal early maturity; test thoroughly before production. Grab it if you're shopping providers today.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 19 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 237 days

Repo age: 2 days

License: MIT

Updated: Apr 23, 2026