JiyangZhang

Unit Test for LLM Agent Skills

11
0
100% credibility
Found Apr 09, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SkillTest is a testing tool for verifying AI agent instructions by running them in isolated environments against predefined scenarios and generating detailed pass/fail reports.

How It Works

1
📖 Discover SkillTest

You hear about a handy tool that lets you test your AI helper instructions to make sure they always work as expected.

2
🛠️ Set it up

You easily install the tool on your computer following simple steps.

3
✍️ Describe your skill

You write a clear guide in a text file telling your AI helper exactly what to do.

4
📝 Add test examples

You list out sample tasks and what good results should look like.

5
🚀 Run the tests

You press go, and the tool automatically checks your skill against every example.

6
📊 See the results

A beautiful web page opens showing pass or fail for each test with helpful notes.

Skill proven!

You confidently use your reliable AI helper knowing it passes all checks.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is skilltest?

Skilltest is a Python unit test framework for LLM agent skills, bringing github unit tests to AI tools like Claude Code. Define skills in Markdown files alongside YAML test cases with prompts, inputs, and expectations graded by pytest or an agent judge—all automated in Docker for reproducible runs. It spits out github unit test coverage reports, JUnit XML for CI, and interactive HTML dashboards with pass rates and evidence.

Why is it gaining traction?

Unlike scattered eval tools, it enforces colocated tests with skills for fast feedback loops, mixing deterministic pytest checks (file existence, page counts) with LLM judgments for nuanced outputs. Hooks like CLI commands for runs, diffs, coverage ablation, and github unit test automation via GitHub Actions make agent dev feel like ordinary unit test github actions workflows.

Who should use this?

Developers building agent skills for document tasks (e.g., PDF merging, form filling) or any LLM runtime needing reliable verification. Teams using Claude Code or planning multi-model support, frustrated by informal prompting, will dig the pytest oracles for unit test definition precision.

Verdict

Promising unit test framework at 11 stars and 1.0% credibility—early maturity shows in limited runtimes, but strong docs and demo lower barriers. Prototype your agent skills here; integrate once it hits escape velocity.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.