DeepGym

DeepGym / deepgym

Public

RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.

24
1
100% credibility
Found Mar 26, 2026 at 24 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

DeepGym offers sandboxed coding environments with automatic scoring for reinforcement learning training of AI coding agents.

How It Works

1
πŸ” Discover DeepGym

You hear about DeepGym, a helpful tool for training AI models to write better code using reliable scores.

2
πŸ“¦ Install easily

With one simple command, you add DeepGym to your setup and it's ready to go.

3
🎯 Pick a challenge

Choose from ready-made coding tasks like making change with coins or sorting lists.

4
πŸ€– Generate code

Your AI model creates a solution for the challenge.

5
βœ… Get instant score

DeepGym safely runs the code and gives you a clear score showing how good it is.

6
πŸš€ Train your model

Use the scores to teach your AI model to improve step by step.

πŸ† AI masters coding

Your model now solves coding problems reliably and gets top scores on benchmarks.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 24 to 24 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is deepgym?

DeepGym delivers RL training environments with verifiable rewards tailored for coding agents, where you feed in model-generated Python code, it runs safely in a sandbox, verifies against tests, and spits out a precise score for your GRPO, PPO, or DAPO loops. It ships with 24 built-in ai training environments like coin change and fizzbuzz, plus scripts to import 2,350+ benchmarks from HumanEval, MBPP, and BigCodeBench, all integrable with TRL, verl, OpenRLHF, and Unsloth. Developers get a Gymnasium-style API, CLI for quick runs, and batch scoring for efficient synthetic training environments.

Why is it gaining traction?

It stands out by tackling reward hacking head-on with per-test-case breakdowns and deterministic seeding, giving denser signals than binary pass/fail setups in other rl training environments. Drop-in reward functions plug straight into popular RL frameworks without glue code, and modes like local subprocess or Daytona sandboxes scale from dev to production. The verifier protocol outputs structured JSON with cases and seeds, making it dead simple to debug and iterate on virtual training environments for agents.

Who should use this?

AI researchers fine-tuning coding agents via RLHF need these verifiable rl training environments to avoid brittle rewards. Teams training github copilot-style models on private repos or building training environments for reinforcement learning cybersecurity agents will appreciate the importable benchmarks and multi-turn support. Devs prototyping agentic workflows with tool-use or computer-use tasks get sandboxed execution without setup hassle.

Verdict

Grab it if you're doing RL on coding agentsβ€”solid integrations and extensible verifiers make it a practical starter despite 24 stars and 1.0% credibility score. Still alpha with room for more docs and tests, but the core delivers; test your own verifiers against its adversarial audits first.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.