alibaba-damo-academy

RynnBrain: Open Embodied Foundation Models

568
55
100% credibility
Found Feb 13, 2026 at 280 stars 2x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Jupyter Notebook
AI Summary

RynnBrain is an open-source vision-language model from Alibaba DAMO Academy that excels in egocentric video understanding, precise spatio-temporal localization, physical-space reasoning, and robot task planning.

How It Works

1
👀 Discover RynnBrain

You hear about this smart AI helper that understands videos from a robot's viewpoint, spotting objects, paths, and planning actions.

2
🖥️ Try the online demo

Upload a short video or photo and ask simple questions like 'Where's the cup?' to see it think and point instantly.

3
Watch it shine

The AI draws boxes around objects, traces movements, and suggests next steps, making robot vision feel magical.

4
📖 Follow fun guides

Open ready-made notebooks that show how to count items, find grasp spots, or plan robot paths step by step.

5
💻 Get your own copy

Download free models and run them on your computer to test with your own videos.

6
🔧 Customize for your needs

Tweak it with your robot footage to teach special skills like navigating rooms or picking items.

🤖 Your robot gets smarter

Now your robot sees, understands, and acts in the real world just like you imagined.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 280 to 568 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is RynnBrain?

RynnBrain delivers open embodied foundation models trained on egocentric videos for real-world tasks like robot planning, vision-language navigation, and chain-of-point reasoning. Built on Qwen3-VL bases in sizes from 2B to 30B-A3B MoE, it processes omni-vision inputs to output trajectories, pointing, and actions via a unified encoder-decoder setup. Developers get pretrained weights on Hugging Face, Jupyter Notebook cookbooks for spatial cognition and planning demos, and fine-tuning scripts for custom embodied models.

Why is it gaining traction?

It stands out by grounding language reasoning in physical space—alternating text and localization for precise outputs like grasp poses or paths—beating baselines in embodied QA, counting, and navigation benchmarks. The plug-and-play HF integration, Gradio demo space, and ready-to-run notebooks let devs test video understanding without setup hassle. Early results show it boosts downstream VLAs for manipulation and nav.

Who should use this?

Robotics engineers fine-tuning VLAs for manipulation or household tasks, embodied AI researchers benchmarking spatial grounding, and sim-to-real devs needing navigation models for Habitat or MP3D scenes. Ideal for teams prototyping with egocentric video data who want foundation models over scratch training.

Verdict

Grab it if embodied AI is your jam—models and notebooks make experimentation fast despite 261 stars and 1.0% credibility score signaling early maturity. Docs are solid with performance tables, but expect tweaks for production scaling.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.