xiaomi-research

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

19
2
100% credibility
Found Apr 02, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

UniDriveVLA is an open-source unified vision-language-action model for autonomous driving, providing training code, pretrained weights, and evaluation on nuScenes and CARLA Bench2Drive benchmarks.

How It Works

1
๐Ÿš— Discover smart self-driving AI

You find UniDriveVLA, a free tool from researchers that teaches AI to see, think, and drive like a human.

2
๐Ÿ“ฅ Download the kit

Grab the easy-to-use code, ready AI brains, and driving test worlds from their safe online hub.

3
๐Ÿ› ๏ธ Set up your playground

Connect the virtual city simulator and load the AI driver with a few simple steps.

4
๐Ÿค– Wake up the AI driver

Your smart driver comes alive, ready to tackle real roads and challenges with human-like smarts.

5
๐Ÿ Run driving tests

Watch it navigate busy streets, intersections, and weather in realistic simulations and benchmarks.

๐ŸŽ‰ Celebrate top scores

See your AI shine with world-class results on driving tests, proving it's ready for the road.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is unidrivevla?

UniDriveVLA is a Python-based Vision-Language-Action model that unifies scene understanding, perception, and action planning for autonomous driving. It tackles the tradeoff between 2D semantic reasoning and 3D spatial perception using specialized experts coordinated via masked attention, delivering end-to-end trajectory prediction from multi-view images. Developers get pretrained base/large models with HuggingFace weights, plus evaluation setups for nuScenes open-loop planning and CARLA's Bench2Drive closed-loop driving.

Why is it gaining traction?

It stands out by decoupling perception and reasoning experts in a Mixture-of-Transformers setup, hitting SOTA on nuScenes L2/collision metrics (e.g., 0.51 avg L2 for large) and Bench2Drive scores (78% driving score, 52% success). The progressive three-stage training preserves VLM smarts while boosting 3D tasks like detection and forecasting. Devs dig the ready-to-run evals, sparse perception for efficiency, and broad applicability across VQA, mapping, and planning.

Who should use this?

Autonomous driving researchers benchmarking end-to-end VLAs on nuScenes or CARLA. Perception/planning engineers unifying pipelines for motion forecasting or trajectory gen. Teams iterating on closed-loop sims via Bench2Drive, especially those extending Qwen-VL backbones.

Verdict

Grab it for cutting-edge VLA baselines if you're in AV researchโ€”SOTA results and HF checkpoints make prototyping fast. At 19 stars and 1.0% credibility, it's early (fresh 2026 paper), so expect some setup tweaks, but solid docs and evals lower the barrier.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.