g3t-paper

g3t-paper / g3t

Public

Code for G3T and G3T-Long

31
0
89% credibility
Found May 30, 2026 at 31 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

G3T is an AI research project from Cornell University that transforms collections of photos into gravity-aligned 3D reconstructions. It uses a transformer-based approach to predict upright, properly-oriented 3D point clouds regardless of how the camera was held, and includes a pipeline for processing long video sequences with automatic loop closure detection.

How It Works

1
📷 Gather your photos

You collect a folder of photos from your phone or camera, capturing a scene from different angles.

2
🧠 Let the AI understand your scene

The system automatically studies your photos and figures out how they connect, like a puzzle solver.

3
🗺️ Watch your 3D map appear

A detailed 3D point cloud emerges, showing your scene with accurate depth and structure.

4
Choose your processing mode
Quick mode

Process a folder of photos in one go for fast results

🎥
Long video mode

Handle video sequences with automatic loop closure for large scenes

5
🔄 Gravity alignment happens automatically

Your 3D reconstruction is automatically oriented so that floors are flat and walls are vertical, no matter how the camera was tilted.

🎉 Explore your reconstruction

You open an interactive viewer to rotate, zoom, and examine your 3D scene from any angle.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 31 to 31 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is g3t?

G3T is a computer vision project that produces gravity-aligned 3D reconstructions from image collections, meaning the output point clouds are always oriented "upright" regardless of how the camera was tilted during capture. The core model is a transformer fine-tuned from Meta's VGGT-1B checkpoint, and it comes with a companion pipeline called G3T-Long for processing long video sequences with automatic loop closure detection. The project provides Python inference scripts, a viser-based interactive 3D viewer, and pre-trained weights on HuggingFace.

Why is it gaining traction?

The gravity-aligned output is the key differentiator. Most 3D reconstruction tools produce point clouds in arbitrary orientations, which creates headaches when stitching multiple scenes or aligning with map data. G3T sidesteps this by predicting upright coordinates directly, simplifying downstream tasks like map registration and scene matching. The long-sequence pipeline is also notable for handling video loops gracefully, which is a common pain point in SLAM and reconstruction systems.

Who should use this?

This is squarely aimed at researchers and engineers working on 3D mapping, robotics navigation, or photogrammetry pipelines where consistent world orientation matters. If you're building systems that need to merge scans from different angles or align reconstructions with GIS data, G3T could save you the hassle of post-hoc gravity alignment. Academic researchers exploring scene representation and structure-from-motion will find the paper and codebase useful. Production teams should approach cautiously given the project's early stage.

Verdict

G3T solves a real problem in 3D reconstruction with a clever, well-documented approach backed by a Cornell research paper. However, with only 31 stars and a 0.899% credibility score, this is clearly an early-stage academic release. The training code is explicitly marked incomplete, and the C++ loop closure module remains unreleased. Evaluate it for research purposes or controlled pilots, but wait for a more mature release before betting production workloads on it.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.