austintgriffith

Eval suite for ethskills.com — measuring whether AI agents actually learn from Ethereum skill docs

19
2
100% credibility
Found Mar 12, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Shell
AI Summary

A testing suite that evaluates if AI models improve their factual accuracy on Ethereum topics when provided with specific skill documentation from ethskills.com.

How It Works

1
🔍 Discover ethskills evals

You find this handy tool on GitHub that tests if AI really picks up Ethereum know-how from special skill guides.

2
📥 Grab the files

Download the project to your computer to get started with testing AI learning.

3
🔗 Link your AI helpers

Connect a couple of your favorite AI services so they can answer questions about Ethereum topics.

4
🎯 Choose what to test

Pick a specific Ethereum skill like gas costs or security tips, or test them all at once.

5
🚀 Run the tests

Hit go and watch as the tool asks your AI the same questions once plain and once with skill guides for comparison.

6
📊 Review the results

Get a simple report showing pass, partial, or fail rates, plus how much better the AI does with the guides.

Prove the skills work

Celebrate seeing clear proof of whether the skill guides truly boost your AI's Ethereum smarts.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ethskills-evals?

This Shell-based ai eval suite tests if AI agents actually learn from Ethereum skill docs on ethskills.com. It runs A/B comparisons: query an LLM like Claude or GPT with and without loaded docs, then a judge model scores factual accuracy on 64 evals across topics like gas costs, L2s, security patterns, and standards. You get JSON reports with pass rates and uplift metrics, showing if your RAG setup boosts performance on real Eth knowledge.

Why is it gaining traction?

Unlike generic llm eval suites, this github eval ai tool focuses on Ethereum specifics—catching hallucinations on addresses, upgrades like Pectra/Fusaka, and agent infra like ERC-8004. The CLI runner (./runner/run.sh) makes it dead simple: set OpenAI or Venice keys, filter by skill or model, skip baselines for speed, and parse results with jq. Devs dig the YAML eval format for quick additions, plus structured judge reasoning exposes why agents fail.

Who should use this?

Ethereum devs building AI agents or RAG pipelines for onchain tasks, like querying L2 DEXes or security checks. LLM teams tuning models for blockchain docs, or Eth educators validating skill effectiveness. Skip if you're not in crypto agents.

Verdict

Solid starter for Eth-focused evals with strong docs and 64 ready tests, but low maturity at 17 stars and 1.0% credibility score means expect tweaks for production. Grab it if you're prototyping agent knowledge—run once to benchmark your setup.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.