amazon-far / deltatok
PublicA Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens (CVPR 2026 Highlight)
This repository implements DeltaTok for compressing video frame differences into single tokens and DeltaWorld for autoregressively predicting future frames, with code for training on action videos and evaluating on perception tasks like segmentation and depth estimation.
How It Works
You stumble upon this exciting project that helps computers understand videos by capturing tiny changes between frames and predicting what happens next.
You install simple tools on your computer to get everything ready for working with videos.
You download sets of real-world videos like action clips and street scenes to teach the system.
You launch the training so it learns to squeeze each video change into one smart token, watching progress as it improves.
Using the compressor, you train a companion that dreams up possible next moments in videos.
You run it on fresh clips to see predictions for tasks like spotting objects or measuring distances.
Now you have a powerful tool that efficiently models and forecasts video worlds, ready for your experiments.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.