kaichen-z

kaichen-z / PAGE4D

Public

[ICLR 2026] PAGE-4D: Disentangled Pose and Geometry Estimation for VGGT-4D Perception

89
3
100% credibility
Found Mar 23, 2026 at 89 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

PAGE4D is a feed-forward neural network that estimates camera poses, depth maps, and dense 3D point clouds from multi-view images of dynamic scenes including moving objects.

How It Works

1
๐Ÿ” Discover PAGE4D

You hear about PAGE4D, a smart tool that turns photos of moving scenes into 3D models with camera positions.

2
๐Ÿ“ฅ Get everything ready

Download the tool and its brain (pre-trained model) so it's all set up on your computer.

3
๐Ÿ“ธ Gather your photos

Collect a handful of pictures from different angles of something moving, like a person dancing.

4
๐Ÿš€ Run the analysis

Feed your photos into the tool with a few lines of code, and watch it work its magic.

5
โœจ Watch the scene come alive

In seconds, you see camera positions, depth layers, and full 3D points of your moving scene appear.

6
๐Ÿ“Š Explore the results

Check out depth maps, 3D points, and camera paths to understand your scene perfectly.

๐ŸŽ‰ Perfect 4D reconstruction

You've got a complete moving 3D model ready for videos, AR, or further analysis!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 89 to 89 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is PAGE4D?

PAGE4D is a Python toolkit for feed-forward 4D perception in dynamic scenes, taking multi-view images and outputting disentangled camera poses, depth maps, and dense point clouds without optimization or post-processing. It extends VGGT for handling moving humans and deformable objects, pulling pretrained weights from Hugging Face for instant inference on batches like `predictions = model(images)`. Eval pipelines benchmark monocular/video depth and relative pose on datasets like Sintel, Bonn, TUM, and DynCheck.

Why is it gaining traction?

This GitHub ICLR 2026 paper stands out by disentangling pose/geometry in one pass, beating iterative methods on dynamic benchmarks while matching static VGGT performance. Devs hook on the quick-start API, bash-driven training/eval, and feature viz scripts that reveal frame-local vs. cross-view attention. Early ICLR 2026 OpenReview buzz and arXiv leak draw CV folks chasing state-of-the-art without COLMAP hassles.

Who should use this?

Computer vision researchers validating disentangled 4D models on dynamic RGB-D data. Robotics engineers prototyping SLAM/VO for cluttered homes with people. AR teams needing fast scene geometry from phone cams.

Verdict

Promising ICLR 2026 research code for disentangled perception, with solid evals and HF checkpoints, but 89 stars and 1.0% credibility signal early maturityโ€”docs are README-focused, no tests. Prototype with it now; watch for workshops.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.