cvlab-kaist

Official code implementation for TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

34
1
100% credibility
Found May 14, 2026 at 48 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

TrackCraft3R repurposes a video diffusion model to predict dense 3D trajectories from monocular videos using predicted depth and camera poses.

How It Works

1
📹 Pick your video

Start with a short video clip of something moving, like a person dancing or a car driving.

2
Estimate depth and camera
âš¡
Quick depth tool

Try the fast Depth-Anything option for everyday videos.

🎯
Precise depth tool

Choose ViPE for more accurate results on tricky scenes.

3
📦 Package it up

Combine your video with the depth and camera guesses into one ready file.

4
🚀 Launch the tracker

Hit go and watch as it uncovers thousands of 3D paths in one quick pass.

✨ See your 3D world

Enjoy colorful trails and spinning point clouds showing every motion in 3D.

Sign up to see the full architecture

3 more

Sign Up Free

Star Growth

See how this repo grew from 48 to 34 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is TrackCraft3r?

TrackCraft3R turns monocular videos into dense 3D trajectories with predicted depth and camera poses, using a repurposed pre-trained video diffusion transformer for single-pass tracking. Feed it a video via scripts that preprocess with tools like Depth-Anything-V3 or ViPE, build an NPZ bundle, run inference on Python with PyTorch, and visualize tracks in a browser—outputs per-pixel paths in frame-0 camera space. As the official GitHub repository and official code release for the method, it solves iterative optimization headaches in 3D video tracking.

Why is it gaining traction?

It skips heavy training from scratch by fine-tuning LoRA on a public 1.3B video DiT backbone, delivering fast dense tracks without multi-stage refinement—unique for diffusion-based trackers. Developers dig the end-to-end pipeline: download official GitHub releases for weights via Hugging Face CLI, eval on benchmarks like Kubric, or track custom clips in minutes. The official code zone stands out with clear scripts for training stages and user inference.

Who should use this?

CV researchers benchmarking dense trackers on synthetic/real videos, robotics engineers needing quick 3D motion from phone cams (bypassing full SLAM), or AR/VR devs prototyping object flows. Ideal for those handling dynamic scenes where sparse keypoints fail, like human motion or cluttered environments.

Verdict

Promising for diffusion-in-CV experiments, but at 34 stars and 1.0% credibility score, it's early—docs shine with step-by-step pipelines and HF models, yet lacks broad tests or community forks. Grab the official GitHub releases if you're in 3D tracking; otherwise, monitor for maturity.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.