RivalSecurity / sastbench

Public

Benchmark and evaluation framework for static application security testing (SAST) and vulnerability analysis agents.

100% credibility

Found May 05, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

SastBench is a benchmark dataset and framework for evaluating AI agents that triage security vulnerability findings from static analysis tools.

How It Works

📚 Discover SastBench

You find a helpful benchmark to test AI assistants that spot real security bugs in code.

🧰 Grab the kit

Download the ready-made test data and example smart assistants.

Pick your assistant

✅

Use example

Pick one of the built-in assistants to get started fast.

🛠️

Build custom

Adapt an example to make your own security checker.

🚀 Launch the test

Start the evaluation and let it run checks on hundreds of code examples.

🔍 Watch it work

See your assistant review findings and decide which bugs are real.

📊 Get your scores

Receive a clear report with accuracy, precision, and how well it triages bugs.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is sastbench?

SastBench is a Python framework for benchmarking static application security testing (SAST) triage agents, letting you evaluate how well AI tools distinguish real vulnerabilities from false positives in code findings. It ships with a JSON dataset blending CVE fixes (true positives) and Semgrep scans (false positives), plus Dockerized sample agents powered by LLMs like Gemini or DeepSeek for quick benchmark evaluation of LLM performance. Run evals via CLI with `./quickrun.sh` to get JSON reports on accuracy, confidence, and processing times.

Why is it gaining traction?

Unlike generic LLM benchmarks, SastBench targets agentic SAST triage with real-world commit context, making benchmark evaluation meaning tangible for security workflows. Its generic analyzer interface lets you plug in custom agents easily—copy a sample, tweak the analyze logic, and test via REST API—while supporting Vertex AI, OpenAI, and models like DeepSeek for benchmark evaluation of DeepSeek large language models. The arXiv-backed dataset generation from NVD and PyDriller ensures reproducible, grounded tests.

Who should use this?

Security engineers building or tuning LLM-based vuln detectors need this for rigorous benchmark evaluation performance metrics. SAST tool developers can benchmark GitHub Copilot-style agents against Semgrep noise. Researchers evaluating LLMs in clinical decision-making-like code analysis or vision-language model challenges will find the CVE/Semgrep split ideal for agent triage baselines.

Verdict

Grab it if you're into SastBench a benchmark for testing agentic SAST triage—solid docs and quickstart make it dev-friendly despite 10 stars and 1.0% credibility score signaling early maturity. Test your agents today; contribute dataset expansions for broader benchmark evaluations applications and challenges.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 10 stars

Bonus: AI verified quality (100%)

Account age: 481 days

Repo age: 5 days

License: Apache-2.0

Updated: May 05, 2026