Tencent-Hunyuan / HY-Embodied

Public

HY-Embodied: Embodied Foundation Models for Real-World Agents

565

100% credibility

Found Apr 14, 2026 at 509 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

This repository offers inference code for HY-Embodied-0.5, a suite of AI foundation models optimized for real-world agents with strong spatial-temporal perception and embodied reasoning capabilities.

How It Works

🔍 Discover HY-Embodied

You stumble upon this exciting AI tool from Tencent that acts like a smart brain for robots, understanding images, spaces, and actions.

📥 Grab the files

Download the simple starter files to your computer so you can try it out yourself.

⚙️ Get your computer ready

Check that your setup has a strong graphics card and the basic tools needed to run smoothly.

🚀 Wake up the AI brain

Launch the easy example script, and it automatically grabs the AI model, filling your screen with readiness.

🖼️ Share an image and question

Pick a photo of a scene or object, add a question about describing it or planning actions, and send it to the AI.

💭 Watch it think step-by-step

The AI pauses to reason carefully, considering spaces, movements, and interactions like a real expert.

✅ Unlock smart robot insights

Enjoy detailed, accurate responses that help you understand scenes or guide robot behaviors perfectly.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 509 to 565 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is HY-Embodied?

HY-Embodied delivers a family of Python-based embodied foundation models designed for real-world agents, tackling the gap between general vision-language models and the precise needs of physical robotics. You get pretrained weights—like the efficient HY-Embodied-0.5 MoT-2B for edge devices and a 32B powerhouse for heavy reasoning—that excel at spatial-temporal perception, prediction, interaction, and planning from images or video. Load them via Hugging Face Transformers for quick inference on tasks like robot control in Vision-Language-Action pipelines, outputting structured thoughts and normalized coordinates.

Why is it gaining traction?

It crushes competitors on 16+ embodied benchmarks, like 89.2 on CV-Bench and 82.8 on EmbSpatial-Bench, often with fewer active parameters for faster real-world deployment. Developers dig the plug-and-play inference scripts for single or batch runs, thinking mode for step-by-step reasoning, and seamless VLA integration without custom training. With 497 stars, it's pulling in robotics folks tired of underperforming VLMs in 3D spaces.

Who should use this?

Robotics engineers wiring up physical agents for manipulation or navigation tasks. AI researchers benchmarking embodied reasoning on spatial datasets. Hardware devs deploying lightweight vision models to edge robots handling real-world interactions like object grasping or trajectory planning.

Verdict

Grab it if you're in embodied AI—impressive benchmarks and ready inference make it a solid starting brain for agents, despite the 1.0% credibility score and modest 497 stars signaling early maturity. Docs are clear, but hold off for production until fine-tuning lands.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

565

Stars

Forks

2,446

Followers

Base stars: 565 stars

Bonus: AI verified quality (100%)

Account age: 344 days

Repo age: 12 days

License: NOASSERTION

Updated: Apr 18, 2026