gy15901580825

gy15901580825 / Argus

Public

Black-box, open-source red-team testing for AI agents. Point Argus at any HTTP, gRPC, or browser-using agent endpoint, run 500+ adversarial probes (OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, TAP/PAIR/GCG), get LLM-judged findings as SARIF, gate CI via GitHub Code Scanning. Ships with CLI + GH Action.

12
1
85% credibility
Found May 30, 2026 at 12 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Argus is a black-box security testing platform for AI agents. It allows security teams and developers to run automated adversarial tests against any AI agent endpoint (HTTP APIs, gRPC services, or browser-based agents), then generates detailed reports mapping findings to industry security standards. The platform includes 167 pre-built attack probes covering prompt injection, data leakage, and other LLM-specific vulnerabilities, with optional AI-powered analysis to judge attack success.

How It Works

1
🔍 Discover a security testing tool

A developer or security engineer learns about Argus as a way to test AI agents for vulnerabilities like prompt injection and data leakage.

2
🎯 Point it at your AI agent

You describe your AI agent with a simple configuration file — just tell Argus where your agent lives and how to talk to it.

3
Run the security tests

Argus automatically runs 167 different attack probes against your agent, testing for OWASP vulnerabilities, hidden instructions, and other threats.

4
🛡️ Get your security report

Within minutes, you receive a detailed report showing exactly which attacks succeeded, mapped to industry standards like OWASP LLM Top 10 and MITRE ATLAS.

5
📊 Review findings and fix issues

Each finding includes the attack that worked, what your agent revealed, and which security framework it relates to — so your team knows exactly what to fix.

Ship with confidence

You integrate the security report into your CI pipeline and can prove to stakeholders that your AI agent has been tested against real adversarial attacks.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 12 to 12 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Argus?

Argus is an open-source red-team testing framework for AI agents. It attacks your deployed agent from the outside, running 167+ adversarial probes that test for prompt injection, payload obfuscation, UI phishing, and other attack vectors drawn from OWASP LLM Top 10, MITRE ATLAS, and NIST AI RMF standards. You point it at any HTTP, gRPC, or browser-using agent endpoint, and it fires probes with an LLM-judged verdict on each one. Results come out as SARIF 2.1.0 for GitHub Code Scanning, JUnit XML for CI gates, or HTML for human review.

Why is it gaining traction?

Most AI testing tools score benchmark accuracy on individual prompts. Argus tests what actually ships: multi-turn agents that plan, call tools, and open browsers. It treats your agent as a black box, so you do not need source access or any instrumentation inside your target. The optional PromptGuard integration lets you measure how much your input-side defenses actually block, with real before/after numbers in their demo. The CLI ships as a PyPI package, and there is a bundled GitHub Action that blocks your pipeline on critical findings.

Who should use this?

Security teams evaluating AI agents before production deployment. Red-teamers building adversarial test suites for agentic systems. Developers who want automated probe coverage mapped to OWASP LLM Top 10 without hand-authoring test cases. Teams with GitHub Code Scanning workflows who need SARIF output they can gate CI on.

Verdict

At 12 stars with a 0.85% credibility score, this is early-stage and unproven at scale. The probe library is substantive (167 probes, garak wrappers, TAP/PAIR/GCG algorithms), and the multi-target adapter design is solid. However, the sparse community footprint means you will be an early adopter bearing the risk. Worth evaluating against a sandboxed demo target, but do not bet production security on it without thorough validation first.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.