nv-tlabs

nv-tlabs / dvlt

Public

Official implementation of Déjà View: Looping Transformers for Multi-View 3D Reconstruction

79
3
100% credibility
Found Jun 01, 2026 at 80 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This is a legitimate NVIDIA research project implementing a recurrent transformer model called DVLT (Déjà View Looping Transformers) for multi-view 3D reconstruction, developed by researchers from NVIDIA and several universities including University of Toronto, ETH Zurich, and University of Modena.

Star Growth

See how this repo grew from 80 to 79 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is dvlt?

DVLT (Deja View) is a recurrent transformer that reconstructs 3D scenes from multiple unordered images. You feed it a pile of photos from different angles, and it spits out depth maps, camera poses, and a 3D point cloud. The trick is that a single block of attention loops over frames K times -- K becomes a dial you turn at inference time to trade speed for quality. It comes from NVIDIA research and runs on PyTorch with HuggingFace model weights available.

Why is it gaining traction?

The pitch is efficiency without sacrifice. DVLT claims to match or beat larger feed-forward models at a fraction of the parameters, and you control the compute budget at runtime by adjusting refinement steps. It ships with evaluation wrappers for five competing methods (VGGT, Depth-Anything-3, Pi3, and others), so you can run head-to-head benchmarks on your own data. The interactive Gradio demo lets you upload images or video and visualize the reconstructed scene in a browser, which is useful for demos and sanity checks.

Who should use this?

3D vision researchers comparing multi-view reconstruction approaches will find the benchmark wrappers most valuable. If you're evaluating whether recurrent attention beats feed-forward for your use case, this gives you a ready-made comparison harness. Robotics teams needing camera pose and depth from image sequences might benefit, though the non-commercial model license restricts commercial deployment. It's not ready for production pipelines yet.

Verdict

With 79 stars and a 1.0% credibility score, this is an early-stage research release, not a polished library. The code is well-structured with Hydra configs, multi-GPU training, and solid documentation for installation and datasets, but only ScanNet++ training loader is shipped at launch. Worth exploring if you're working in this space, but treat it as a research artifact rather than a drop-in tool.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.