eric-ai-lab

Official codebase for the paper "Auditing Agent Harness Safety"

32
1
100% credibility
Found May 21, 2026 at 33 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

HarnessAudit is an academic evaluation framework for auditing AI agent systems to verify they obey safety boundaries around tool usage, resource access, and information flow, complete with a published research paper, Hugging Face dataset, and detailed technical documentation.

Star Growth

See how this repo grew from 33 to 32 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is HarnessAudit?

HarnessAudit is a Python evaluation framework for testing whether AI agent systems respect safety boundaries. Instead of just checking if an agent produces the right answer, it watches the entire execution trace to see if the agent accessed the wrong tools, touched protected resources, or leaked sensitive information to the wrong recipients. The system runs agents through realistic scenarios across domains like finance, healthcare, and ecommerce, then scores them on Safety Adherence Rate (SAR), Action Validity Score (AVS), and Task Completion Rate (TCR). It works with popular agent frameworks including ClawTeam, OpenClaw, Claude Code, Codex, and Google's ADK.

Why is it gaining traction?

This fills a critical gap: most agent benchmarks only measure task success, ignoring whether agents behave safely along the way. HarnessAudit catches subtle violations like an agent using an unauthorized tool or sending customer data to the wrong agent. The framework also tests robustness against injection attacks and ambiguous instructions, which are real risks in production agentic systems. The three-layer evaluation (boundary compliance, execution fidelity, perturbation stability) gives a comprehensive picture that single-metric benchmarks cannot.

Who should use this?

Security researchers auditing AI agents, developers building multi-agent systems who need to verify safety properties, and academics studying agent behavior will find this most useful. It's less relevant for simple single-agent applications or projects that do not require strict safety boundaries.

Verdict

This is a solid research tool from an academic team, but it is early-stage with only 32 stars and limited documentation. The 1.0% credibility score reflects its novelty rather than quality. If you are evaluating agent safety seriously, it is worth exploring, though expect to invest time understanding the framework and adapting it to your use case.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.