mattersec-labs / seclens

Public

Role-Specific Evaluation of LLMs for Security Vulnerability Detection

mattersec-labs.github.ioseclens ai benchmark security-research

100% credibility

Found Mar 23, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

SecLens is a benchmark tool that tests AI language models on detecting real-world security vulnerabilities in code, scoring them differently for various professional roles like CISO and engineers.

How It Works

🔍 Discover SecLens

You hear about a smart tool that tests AI helpers on finding security flaws in real code, giving grades from different job views like security boss or team lead.

📥 Set it up simply

Follow easy steps to get the tool ready on your computer, like adding a helpful app.

🔗 Connect an AI brain

Link to your chosen AI service so it can think and analyze code snippets.

🚀 Run a test

Pick an AI model and a collection of real code issues, then start the check to see how well it spots problems.

📊 See quick results

Get a summary of scores like how accurate and affordable it was overall.

👓 Check role grades

View custom reports graded for roles like security chief or engineer, showing strengths and fits.

🎉 Choose your best AI

You now know exactly which AI works best for your needs, with shareable insights to decide confidently.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 14 to 14 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is seclens?

SecLens is a Python CLI tool for role-specific evaluation of LLMs on security vulnerability detection, using 406 real CVE tasks from 93 open-source repos across 10 languages. It benchmarks models like Claude or Gemini in code-in-prompt or sandboxed tool-use modes, scoring them through five stakeholder lenses—CISO, CAIO, researcher, engineering lead, AI actor—with A-F grades on 35 dimensions like recall, cost, and tool efficiency. Run evals, generate reports, or compare models to see divergences, like Qwen3-Coder acing engineering but flunking CISO.

Why is it gaining traction?

Ditches generic leaderboards for tailored scores: top model for one role tanks for another, with up to 31-point gaps exposed by severity-weighted recall or MCC-per-dollar. Covers OWASP categories, post-patch negatives, and SAST false positives, plus breakdowns by language/category—users spot real strengths like Claude Haiku's CISO edge despite mid-pack ranking.

Who should use this?

CISOs validating LLMs for vuln scanning pipelines, engineering VPs prioritizing precision to boost velocity without noise, AI officers optimizing cost-capability tradeoffs, security researchers dissecting CWE mechanics, agent devs ensuring autonomous detection reliability.

Verdict

Practical Python package for LLM security evals—installs via uv, CLI-first with rich reports and paper-backed results. 14 stars and 1.0% credibility signal early maturity (solid docs/tests, but light adoption); prototype it now before betting on vulnerability detection in prod.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 14 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 98 days

Repo age: 2 days

Updated: Mar 23, 2026