LY-DerekX

Toolkit for collecting, merging, auditing, visualizing, and publishing RGB/RGB-D LeRobot VLA datasets.

25
7
94% credibility
Found May 31, 2026 at 25 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This is a comprehensive toolkit for creating and preparing robot training datasets with depth information. It enables users to record robot movements using Orbbec RGB-D cameras, merge multiple recording sessions into unified datasets, automatically check data quality for corruption or anomalies, remove problematic episodes, and publish the final cleaned dataset to Hugging Face. The project integrates with the LeRobot ecosystem from Hugging Face and supports the full lifecycle of robot data collection and preparation for training vision-language-action models.

How It Works

1
🔍 You discover a robot data toolkit

You find a project that helps you record robot movements with depth cameras and prepare the data for training robot brains.

2
📹 You connect your depth camera and record

You hook up an Orbbec depth camera to your robot arm and start recording yourself controlling it, capturing both video and depth information.

3
🔄 You combine all your recordings

After several recording sessions, you merge all your video clips into one organized dataset with proper episode numbering.

4
🔎 You check your data quality

An automated checker reviews every video frame and depth image, flagging any corrupted recordings or suspicious content for your review.

5
You review flagged items
Keep the good recordings

Episodes that passed the quality checks are marked for keeping.

Remove bad recordings

Corrupted or low-quality episodes are marked for removal from your dataset.

6
You build a clean dataset

The tool creates a fresh copy of your dataset with only the episodes you approved, re-encoding videos to remove unwanted segments.

🚀 You share your dataset with the world

Your cleaned robot training dataset is uploaded to Hugging Face, ready for anyone to download and use for training robot policies.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 25 to 25 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is lerobot-rgb-rgbd-vla-dataset-toolkit?

This Python toolkit fills a gap in the LeRobot ecosystem by adding depth sensing to robot dataset collection. It handles the complete pipeline from physical data capture with Orbbec RGB-D cameras, through merging and quality-checking multiple datasets, to final publishing on Hugging Face.

The project provides five main capabilities: an Orbbec camera overlay that drops into existing LeRobot installations for recording RGB and depth side-by-side, scripts to merge several LeRobot-format datasets into one physical folder with rewritten indices, a conservative quality audit that flags corrupted videos and suspicious episodes into keep/review/drop lists, tools to build a clean training-ready dataset by removing bad episodes and optionally re-encoding videos, and resumable uploads to Hugging Face with large-folder support.

Depth data is stored as uint16 PNG sidecars, which keeps lossless depth information separate from RGB videos.

Why is it gaining traction?

The hook is depth. Most LeRobot datasets and tooling focus on RGB-only robot observations, but depth gives robots critical spatial awareness. This toolkit treats depth as a first-class data stream alongside actions and RGB video, which matters for manipulation tasks where camera angle alone cannot capture object distances reliably.

The audit system is also unusually thorough. Rather than binary pass/fail checks, it distinguishes deterministic failures from ambiguous content issues, outputs structured reports, and lets you manually review borderline cases before building a clean dataset.

Who should use this?

Robotics researchers building VLA training datasets who need depth perception, particularly those working with pick-and-place or spatial manipulation tasks where RGB-only data introduces ambiguity. It is most useful if you are already inside the LeRobot/Hugging Face ecosystem and collecting data from physical robots.

Verdict

This is a niche tool for a specific workflow: RGB-D data collection with LeRobot. The credibility score of 0.95% and 25 stars reflect a very early-stage project with minimal community adoption. The code is functional, the documentation is reasonable, and an example dataset exists on Hugging Face, but there is no test suite, no community, and limited battle-testing.

Recommended if you have Orbbec hardware and need RGB-D LeRobot datasets today. Watch this space if you are planning to build similar tooling, but do not bet a production pipeline on it yet.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.