alibaba / skill-up

Public

A CLI evaluation framework to make your Agent Skill Up.

alibaba.github.ioskill-up agent-skills ai ai-agents alibaba skills

100% credibility

Found May 15, 2026 at 13 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

AI Summary

skill-up is a command-line tool for developers to test and evaluate AI agent skills using simple configuration files and generate detailed reports.

How It Works

🔍 Discover skill-up

You hear about a friendly tool that makes testing AI helpers straightforward and reliable.

📥 Set it up

Download the tool to your computer in moments, ready to use right away.

📝 Plan your tests

Describe simple checks for what your AI helper should do, like everyday tasks.

✅ Double-check

Quickly verify your test plans are clear and ready to go.

🚀 Launch tests

Start running the tests on your AI helper and see it in action.

📊 Celebrate results

Enjoy easy-to-read reports showing your AI's strengths and areas to improve.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 13 to 13 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is skill-up?

Skill-up is a Go-based CLI evaluation framework for developers building Agent Skills, turning ad-hoc testing into declarative YAML configs for environments, test cases, and grading strategies. Drop an `evals/eval.yaml` and `cases/*.yaml` files into your skill repo, then run evals locally or in GitHub Actions CI pipelines across Linux, Ubuntu, or Windows setups. It supports engines like Qoder CLI, Claude Code, and Codex, with judging via rules, scripts, or agent judges, outputting Anthropic-compatible reports like `grading.json`, benchmarks, JUnit XML, and HTML.

Why is it gaining traction?

Unlike one-off scripts or engine-tied tools, skill-up automates the full loop: workspace setup, skill installs via GitHub clone or packages, multi-turn agent runs, and structured reporting for Gemini CLI evaluation or agent skill up workflows. Its `--auto` mode detects Anthropic `evals.json` for quick migration, and CI-ready commands like `validate`, `list-cases`, and `report` make iteration fast without custom GitHub Copilot or GH CLI hacks. Devs love the flexibility for rule-based checks on files/tools or agent judges with custom criteria.

Who should use this?

Agent skill developers at Alibaba or similar, testing LLM skills in GitHub repos with multi-engine support. Ideal for teams running evals in GitHub Actions on Linux/Ubuntu/Windows, or folks exploring skill up techniques from Reddit/YouTube for tools like Ralph or Technogym integrations.

Verdict

Promising early tool (13 stars, 1.0% credibility) with strong docs, e2e tests, and Apache 2.0 license, but still evolving—expect CLI/config breaks. Grab it if you're in agent skill up; otherwise, watch for stability.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

19,615

Followers

Base stars: 13 stars

Bonus: AI verified quality (100%)

Account age: 5,055 days

Repo age: 5 days

License: Apache-2.0

Updated: May 15, 2026