GeniusHTX / SWE-Skills-Bench
PublicThe official repo of our paper, "SWE-Skills-Bench:Do Agent Skills Actually Help in Real-World Software Engineering?"
SWE-Skills-Bench is a dataset of 49 real-world software engineering tasks paired with skill documents to benchmark whether providing domain-specific knowledge improves AI agent performance.
How It Works
You find a collection of 49 real coding challenges to test if special guides make AI helpers better at software tasks.
Download the easy starter files and connect your AI thinking service so it can join the tests.
Browse the list of tasks like fixing bugs, adding features, or improving code, and choose what to try.
Launch tests twice—once with the helpful guide and once without—to see the real difference in action.
Check simple charts and reports showing pass rates, time used, and exactly how much better the guide made things.
You now understand how skill guides boost your AI's coding powers and can improve your own projects.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.