chandar-lab / semantic-wm
Publicrepository for training action-conditioned latent diffusion world models for robot video generation
This is an academic research project that trains robots to imagine future actions. The toolkit lets researchers compare two different approaches: one where the robot learns to reconstruct what it sees pixel-by-pixel, and another where it learns to understand the meaning behind what it sees. The project includes ready-made tools for training robot 'imagination engines' on real robot video data, testing how well they predict future frames, and checking if they can tell when a robot task will succeed or fail. Multiple vision encoders are supported, from standard image compressors to advanced AI vision models, so researchers can easily compare which approach helps robots plan better for real-world tasks.
How It Works
You hear about a new study comparing different ways robots learn to imagine future actions, and you're curious to try it yourself.
You grab the open-source code from the project page and set it up on your computer.
You collect footage of a robot performing tasks—like the Bridge V2 dataset with real robot arm movements and the actions that went with them.
For smarter vision encoders, you first train a small adapter that shrinks the rich visual features down to a compact size the robot brain can work with.
You train the world model—a kind of robot imagination engine—using the video clips and robot actions so it learns what comes next.
Compare generated videos against real ones using picture-perfect metrics
Check if the robot brain responds correctly to different robot commands
See if the model can predict whether a robot will succeed at a task
You find that semantic encoders—ones that understand what objects are—generally help robots plan better than ones that just copy pixels, even when the pictures look less perfect.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.