paradigmxyz

A benchmark and harness for finding and exploiting smart contract bugs

322
46
100% credibility
Found Feb 18, 2026 at 74 stars 4x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
TypeScript
AI Summary

Evmbench is a self-hosted web app from Paradigm and OpenAI for benchmarking AI agents on detecting high-severity vulnerabilities in Ethereum smart contracts.

How It Works

1
๐ŸŒ Discover evmbench

Land on the evmbench homepage and see a welcoming page for testing AI on your smart contracts.

2
๐Ÿ“ Upload your code

Drag a folder or ZIP of your Solidity contracts โ€“ it feels simple and secure.

3
๐Ÿค– Pick an AI auditor

Choose from smart models like codex-gpt to scan for vulnerabilities.

4
๐Ÿ”‘ Connect AI helper

Link your AI service so it can deeply analyze the code.

5
๐Ÿš€ Launch the scan

Click start and watch your contracts get audited by AI right in your browser.

6
๐Ÿ” Explore findings

Browse the file tree, highlighted code issues, and detailed reports.

โœ… Audit insights ready

Share results, benchmark models, and strengthen your contracts effortlessly.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 74 to 322 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is evmbench?

evmbench is a self-hosted benchmark harness for detecting high-severity bugs in Solidity smart contracts using AI models like Codex. Upload a ZIP of your contracts via a TypeScript/Next.js web UI, pick a model, add your OpenAI key, and it queues jobs to run audits in isolated Docker workers, delivering results with a file tree, vuln list, and annotated code viewer. It's built for systematic evaluation of LLMs on real contract bugs, handling everything from auth to result sharing.

Why is it gaining traction?

Unlike generic LLM playgrounds, this llm benchmark harness focuses solely on smart contract vulnerabilities, with secure key proxying and GitHub auth for easy runs. Developers dig the benchmark GitHub Action compatibility and public result sharing, making it simple to compare models on exploits without setup hassle. The Paradigm/OpenAI backing adds credibility for contract benchmark tests.

Who should use this?

Smart contract auditors testing AI detectors on their repos, security researchers running llm harness benchmarks against known Solidity bugs, or teams evaluating Codex vs. rivals in a contract benchmark environment. Ideal for those tired of manual audits or scattered GitHub benchmark tools.

Verdict

Early but promising at 11 stars and 1.0% credibilityโ€”docs nail local Docker/K8s deploys, though maturity shows in limited models and no public test suite. Grab it for quick contract bug benchmarks if you're self-hosting LLMs.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.