sentient-agi

EvoSkill — An open-source framework that automatically discovers and synthesizes reusable agent skills from failed trajectories to improve coding agent performance on long-horizon tasks.

19
2
100% credibility
Found Mar 06, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

EvoSkill is a framework that automatically evolves AI agent skills and prompts through repeated testing and improvement on benchmark datasets.

How It Works

1
🔍 Discover EvoSkill

You hear about EvoSkill, a smart way to make AI helpers better at tough questions without hand-crafting instructions.

2
📦 Get it ready

Download and set up the tools on your computer so everything works smoothly.

3
📊 Add your questions

Put a list of challenging questions and answers in a simple folder for testing.

4
🚀 Start the magic

Hit go on the self-improvement cycle, and watch it automatically test, learn from mistakes, and create better skills.

5
📈 See it improve

Over a few rounds, check how the AI gets smarter on your questions with new abilities.

6
Test the winner

Run final checks to confirm your upgraded AI helper crushes the benchmarks.

🎉 AI supercharged

Celebrate as you now have a top-performing AI agent ready for real tasks, all improved automatically.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is EvoSkill?

EvoSkill is an open-source Python framework that automatically discovers and synthesizes reusable agent skills from failed trajectories to improve coding agent performance on long-horizon tasks. Users run self-improvement loops on benchmarks like DABStep, SEAL-QA, or LiveCodeBench: it tests agents, analyzes failures, proposes prompt tweaks or new skills, evaluates variants, and tracks top performers as git branches. The result? Stronger evoskills models without endless manual tuning.

Why is it gaining traction?

It automates the drudgery of agent iteration—failure-driven evolution beats hand-crafted prompts, matching or beating tuned setups on real evals. Dead-simple Python API launches loops in one line, CLI handles evals with resume/cache, and extension hooks let you plug in custom tasks fast. Developers see quick wins on messy, multi-step coding problems.

Who should use this?

AI researchers benchmarking coding agents on data analysis or QA tasks. Agent builders tackling long-horizon workflows where failures compound—think evolving skills for web search, code gen, or analysis pipelines. Skip if you're not iterating LLMs daily.

Verdict

Grab it if agent perf is your bottleneck—solid API and benchmarks make early experiments cheap. But 19 stars and 1.0% credibility scream "alpha": light tests, nascent community. Prototype now, contribute later.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.