leggedrobotics

source code and trained models for DeFM (Depth Foundation Model)

103
1
100% credibility
Found Feb 04, 2026 at 76 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

DeFM provides pre-trained vision models specialized in processing depth images to extract robust features for robotics applications like navigation and manipulation.

How It Works

1
📰 Discover DeFM

You hear about DeFM, a helpful tool that teaches robots to understand depth pictures for better vision in navigation and grabbing objects.

2
💻 Set it up

You quickly add DeFM to your computer with easy everyday tools, no hassle.

3
🤖 Pick a brain size

You choose from cozy small brains for speedy robots or powerful big ones for tricky sights.

4
🖼️ Feed a depth picture

You prepare a simple depth image from your robot's camera into a special view DeFM understands effortlessly.

5
Unlock robot insights

DeFM instantly reveals smart details about shapes, distances, and scenes, feeling like magic for your robot.

6
🚀 Power your projects

You use these clear visions to make your robot navigate, manipulate, or move smoothly in real world.

🎉 Robot sees the world

Your robot now perceives depths accurately across sensors and setups, succeeding in tough tasks.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 76 to 103 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is defm?

DeFM delivers pretrained PyTorch vision backbones trained on 60M depth images using self-distillation, tailored for robotic perception tasks like navigation, manipulation, and locomotion. It outputs metric-aware features from depth-only inputs that shine in sim-to-real transfer and cross-sensor setups, via a simple TorchHub load from HuggingFace and a preprocess step turning metric depth into 3-channel normalized tensors. Python-based, it skips RGB entirely for pure geometric and semantic depth reps.

Why is it gaining traction?

A model zoo spans ViT-L (307M params, 85% top-5 KNN on depth benchmarks) to tiny EfficientNet-B0 (3M params, Jetson-friendly at 21ms), all without task fine-tuning. Devs dig the robotics-proven sim-to-real generalization and compact sizes for onboard policies, plus dead-simple inference like `torch.hub.load('leggedrobotics/defm:main', 'defm_vit_l14')`. Stands out over generic code github ai backbones by nailing depth-specific priors.

Who should use this?

Robotics engineers deploying depth cams on legged bots or drones for real-time nav/manip. Researchers benchmarking foundation models on depth datasets, fighting sim-to-real gaps in locomotion policies. Teams needing low-latency features on edge hardware like Jetson Orin.

Verdict

Grab it for robotics prototypes—TorchHub integration and benchmarks make eval a breeze, despite 89 stars and 1.0% credibility score signaling early maturity. Strong arXiv paper and Apache license; contribute if depth perception's your deathmatch.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.