paradigmxyz / evmbench

Public

A benchmark and harness for finding and exploiting smart contract bugs

paradigm.xyzevmbench agents ai audit blockchain blockchain-technology

322

100% credibility

Found Feb 18, 2026 at 74 stars 4x -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

TypeScript

AI Summary

Evmbench is a self-hosted web app from Paradigm and OpenAI for benchmarking AI agents on detecting high-severity vulnerabilities in Ethereum smart contracts.

How It Works

🌐 Discover evmbench

Land on the evmbench homepage and see a welcoming page for testing AI on your smart contracts.

📁 Upload your code

Drag a folder or ZIP of your Solidity contracts – it feels simple and secure.

🤖 Pick an AI auditor

Choose from smart models like codex-gpt to scan for vulnerabilities.

🔑 Connect AI helper

Link your AI service so it can deeply analyze the code.

🚀 Launch the scan

Click start and watch your contracts get audited by AI right in your browser.

🔍 Explore findings

Browse the file tree, highlighted code issues, and detailed reports.

✅ Audit insights ready

Share results, benchmark models, and strengthen your contracts effortlessly.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 74 to 322 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is evmbench?

evmbench is a self-hosted benchmark harness for detecting high-severity bugs in Solidity smart contracts using AI models like Codex. Upload a ZIP of your contracts via a TypeScript/Next.js web UI, pick a model, add your OpenAI key, and it queues jobs to run audits in isolated Docker workers, delivering results with a file tree, vuln list, and annotated code viewer. It's built for systematic evaluation of LLMs on real contract bugs, handling everything from auth to result sharing.

Why is it gaining traction?

Unlike generic LLM playgrounds, this llm benchmark harness focuses solely on smart contract vulnerabilities, with secure key proxying and GitHub auth for easy runs. Developers dig the benchmark GitHub Action compatibility and public result sharing, making it simple to compare models on exploits without setup hassle. The Paradigm/OpenAI backing adds credibility for contract benchmark tests.

Who should use this?

Smart contract auditors testing AI detectors on their repos, security researchers running llm harness benchmarks against known Solidity bugs, or teams evaluating Codex vs. rivals in a contract benchmark environment. Ideal for those tired of manual audits or scattered GitHub benchmark tools.

Verdict

Early but promising at 11 stars and 1.0% credibility—docs nail local Docker/K8s deploys, though maturity shows in limited models and no public test suite. Grab it for quick contract bug benchmarks if you're self-hosting LLMs.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

322

Stars

Forks

1,853

Followers

Base stars: 322 stars

Bonus: AI verified quality (100%)

Account age: 1,513 days

Repo age: 12 days

License: Apache-2.0

Updated: Mar 01, 2026