aisa-group

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

18
1
100% credibility
Found Mar 04, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SKILL-INJECT is a benchmark for testing prompt injection vulnerabilities in LLM agent skill files across multiple AI coding agents and safety policy conditions.

How It Works

1
🔍 Discover the safety tester

You find a tool online that checks if AI helpers can be tricked by hidden bad instructions in their skill files.

2
🛠️ Get your computer ready

You install a few simple programs like a box for safe testing and basic tools to run checks.

3
🔗 Connect your AI friends

You link your favorite AI coding buddies so they can join the safety tests.

4
🚀 Launch the vulnerability hunt

With one click, you start running tests to see if sneaky instructions can fool the AIs.

5
📊 Watch the tests unfold

You monitor as different AIs face various tricky scenarios under safety rules.

6
📈 Review the safety scores

You get clear reports showing which AIs fell for tricks and how well they did.

Strengthen your AI setup

Now you know the weak spots and can make your AI agents much safer to use.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is skill-inject?

Skill-inject is a Python benchmark for measuring agent vulnerability to skill file attacks, testing if LLM coding agents like Claude Code, Codex, or Gemini CLI execute hidden malicious instructions embedded in skill definitions. It runs isolated Docker experiments with 41 contextual injections (ambiguous harms) and 30 obvious ones (ransomware, exfiltration), across safety policies like warnings or legitimizing prompts. Developers get attack success rates (ASR), task utility baselines, and ablation results like best-of-N sampling or skill screening.

Why is it gaining traction?

It stands out with ready-to-run CLI experiments on your API keys—no custom setup beyond Docker build—and auto-evaluates via LLM judges, spitting out results in final_results folders. Unlike generic prompt injection tests, it targets real agent skills (documents, email, calendar, healthcare), mimicking alpha-skill-injector or HTB command injection skill assessments but for AI. The Docker sandbox prevents real harm while quantifying defenses.

Who should use this?

AI safety engineers benchmarking agent robustness before production. LLM agent builders evaluating skill injector risks in tools handling files, email APIs, or git. Security teams assessing vulnerability to attacks like hidden exfiltration in uploaded skills.

Verdict

Grab it if you're building secure LLM agents—strong docs, smoke tests, and HPC support via Apptainer make evaluation straightforward. But with 18 stars and 1.0% credibility score, it's early alpha; run your own baselines before trusting for high-stakes decisions.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.