snuvclab

snuvclab / vlmpose

Public

[ArXiv 2026] Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents

18
0
100% credibility
Found Apr 21, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A research tool that uses AI to interpret text prompts and iteratively adjust the 3D poses of objects in mesh-based scenes for tasks like pouring or chess moves.

How It Works

1
🔍 Discover the tool

You stumble upon a fascinating project that uses AI to rearrange objects in 3D scenes based on simple text descriptions, like pouring tea or moving chess pieces.

2
💻 Prepare your setup

You get your computer ready by installing a few helper programs and creating a workspace for the magic to happen.

3
📁 Load your scene

You place 3D model files of objects, like a table with teacups or a chessboard, into a folder to create your starting scene.

4
🧠 Connect the AI

You link up a smart AI service that can look at pictures and understand your words to guide the rearrangements.

5
💬 Describe the change

You type a clear instruction, such as 'Pour the tea into the teacup' or 'Move the knight to f6', telling the AI exactly what to do.

6
Watch it rearrange

The AI selects the right object, tries different views, and step by step moves and rotates it to match your description perfectly.

🎉 Admire the results

You get a folder full of images and updated 3D files showing your scene transformed just as you imagined, ready to explore.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is vlmpose?

VLMpose lets you rearrange 6D object poses in 3D scenes using natural language prompts, like "Pour the tea into the teacup using the teapot" or "Move the black knight from g8 to f6." Built in Python with pyrender for rendering OBJ meshes and OpenAI's API for vision-language model agents, it renders multi-views, selects targets, and iteratively refines poses in a closed-loop until faithful to the text. You get a folder of debug images and JSON pose logs from demo scenes.

Why is it gaining traction?

This stands out for turning vague text into precise 6D manipulations via VLM feedback loops, skipping manual pose annotation—ideal for arxiv papers github projects on agents ahead of ICRA 2026 arxiv or IROS 2026 arxiv submissions. Developers dig the CLI simplicity: point to a mesh dir, add a prompt, and watch it iterate with faithfulness checks. It's a fresh arxiv github python template for text-guided robotics, bridging VLMs to simulation.

Who should use this?

Robotics sim engineers prototyping manipulation tasks, like grasp-to-pour pipelines in Gazebo or Isaac Sim. Researchers eyeing AAAI 2026 arxiv policy or ICLR 2026 arxiv for VLM agents in 6D pose. Anyone building text-to-action in arxiv github search hits for miccai 2026 arxiv or CHI 2026 arxiv use cases.

Verdict

Early research code (18 stars, 1.0% credibility) with solid README and demos, but TODOs for RGB-D and open models signal immaturity—fork for your 2026 agents experiments, not production. Pair with zotero arxiv github for citations.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.