capgym / cap-x

Public

A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

100% credibility

Found Mar 26, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

CaP-X is an open framework for testing and training AI agents that generate code to control simulated robots in manipulation tasks like stacking and assembly.

How It Works

💡 Discover CaP-X

You stumble upon CaP-X, a fun playground where AI learns to guide robots in everyday tasks like stacking blocks or wiping spills.

🤖 Pick a robot adventure

Choose a challenge like lifting a cube or assembling nuts – it's like giving your robot a puzzle to solve with smart code.

📦 Set up your robot world

Download the tools to your computer and prepare simulated robot environments with simple steps.

🧠 Link an AI thinking partner

Connect a helpful AI like Gemini so it can watch the scene and write code to control the robot.

▶️ Launch and watch magic

Hit start to see the AI generate code, the robot move step-by-step, and learn from each try.

📈 Review and improve

Check videos and scores of what worked, tweak the AI's instructions, and run again for better results.

🎉 Robot masters the task!

Your AI-robot team stacks blocks perfectly or wipes spills clean – ready for real-world robot helpers.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is cap-x?

Cap-X is a Python framework for benchmarking coding agents—LLMs and VLMs that output Python code to control robots in simulation. It delivers Gymnasium environments for 39 manipulation tasks across Robosuite, LIBERO-PRO, and BEHAVIOR simulators, plus a tiered benchmark (S1-S4 single-turn, M1-M4 multi-turn) testing abstraction, visual grounding, and interaction modes. Users run evals via simple CLI like `uv run capx/envs/launch.py --config-path task.yaml --model gemini-pro`, with auto-launched perception servers and web UI.

Why is it gaining traction?

This LLM benchmarking framework stands out by composing vision (SAM3, OWL-ViT), motion planning (cuRobo, PyRoKi), and control primitives into agent code, enabling multi-turn visual differencing and parallel ensembling without custom training. Robotics devs grab it for competitive benchmarking framework metrics on code-as-policy, plus RL tools (GRPO/VeRL) that transfer sim policies to real hardware with low sim-to-real gap—far beyond basic env wrappers.

Who should use this?

Robotics researchers evaluating VLMs on dexterous tasks like nut assembly or cube restacking; AI agent builders testing code-gen reliability in physical sims; sim-to-real teams needing standardized python benchmarking framework scores before hardware deploys.

Verdict

Solid start from NVIDIA/Berkeley/Stanford labs with arXiv paper and CUDA-ready CLI, but 19 stars and 1.0% credibility signal early maturity—great docs and regression tests, but tweak configs expectantly. Worth a spin for agent benchmarking if you've got GPU sims.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 19 stars

Penalty: Very new repo (1d): -70%

Bonus: AI verified quality (100%)

Account age: 49 days

Repo age: 1 days

License: MIT

Updated: Mar 26, 2026