mattersec-labs

Role-Specific Evaluation of LLMs for Security Vulnerability Detection

14
0
100% credibility
Found Mar 23, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SecLens is a benchmark tool that tests AI language models on detecting real-world security vulnerabilities in code, scoring them differently for various professional roles like CISO and engineers.

How It Works

1
🔍 Discover SecLens

You hear about a smart tool that tests AI helpers on finding security flaws in real code, giving grades from different job views like security boss or team lead.

2
📥 Set it up simply

Follow easy steps to get the tool ready on your computer, like adding a helpful app.

3
🔗 Connect an AI brain

Link to your chosen AI service so it can think and analyze code snippets.

4
🚀 Run a test

Pick an AI model and a collection of real code issues, then start the check to see how well it spots problems.

5
📊 See quick results

Get a summary of scores like how accurate and affordable it was overall.

6
👓 Check role grades

View custom reports graded for roles like security chief or engineer, showing strengths and fits.

🎉 Choose your best AI

You now know exactly which AI works best for your needs, with shareable insights to decide confidently.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 14 to 14 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is seclens?

SecLens is a Python CLI tool for role-specific evaluation of LLMs on security vulnerability detection, using 406 real CVE tasks from 93 open-source repos across 10 languages. It benchmarks models like Claude or Gemini in code-in-prompt or sandboxed tool-use modes, scoring them through five stakeholder lenses—CISO, CAIO, researcher, engineering lead, AI actor—with A-F grades on 35 dimensions like recall, cost, and tool efficiency. Run evals, generate reports, or compare models to see divergences, like Qwen3-Coder acing engineering but flunking CISO.

Why is it gaining traction?

Ditches generic leaderboards for tailored scores: top model for one role tanks for another, with up to 31-point gaps exposed by severity-weighted recall or MCC-per-dollar. Covers OWASP categories, post-patch negatives, and SAST false positives, plus breakdowns by language/category—users spot real strengths like Claude Haiku's CISO edge despite mid-pack ranking.

Who should use this?

CISOs validating LLMs for vuln scanning pipelines, engineering VPs prioritizing precision to boost velocity without noise, AI officers optimizing cost-capability tradeoffs, security researchers dissecting CWE mechanics, agent devs ensuring autonomous detection reliability.

Verdict

Practical Python package for LLM security evals—installs via uv, CLI-first with rich reports and paper-backed results. 14 stars and 1.0% credibility signal early maturity (solid docs/tests, but light adoption); prototype it now before betting on vulnerability detection in prod.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.