What is skill-up?
Skill-up is a Go-based CLI evaluation framework for developers building Agent Skills, turning ad-hoc testing into declarative YAML configs for environments, test cases, and grading strategies. Drop an `evals/eval.yaml` and `cases/*.yaml` files into your skill repo, then run evals locally or in GitHub Actions CI pipelines across Linux, Ubuntu, or Windows setups. It supports engines like Qoder CLI, Claude Code, and Codex, with judging via rules, scripts, or agent judges, outputting Anthropic-compatible reports like `grading.json`, benchmarks, JUnit XML, and HTML.
Why is it gaining traction?
Unlike one-off scripts or engine-tied tools, skill-up automates the full loop: workspace setup, skill installs via GitHub clone or packages, multi-turn agent runs, and structured reporting for Gemini CLI evaluation or agent skill up workflows. Its `--auto` mode detects Anthropic `evals.json` for quick migration, and CI-ready commands like `validate`, `list-cases`, and `report` make iteration fast without custom GitHub Copilot or GH CLI hacks. Devs love the flexibility for rule-based checks on files/tools or agent judges with custom criteria.
Who should use this?
Agent skill developers at Alibaba or similar, testing LLM skills in GitHub repos with multi-engine support. Ideal for teams running evals in GitHub Actions on Linux/Ubuntu/Windows, or folks exploring skill up techniques from Reddit/YouTube for tools like Ralph or Technogym integrations.
Verdict
Promising early tool (13 stars, 1.0% credibility) with strong docs, e2e tests, and Apache 2.0 license, but still evolvingโexpect CLI/config breaks. Grab it if you're in agent skill up; otherwise, watch for stability.
(198 words)