DCDmllm

DCDmllm / InstructSAM

Public

The code for "InstructSAM: Segment Any Instance with Any Instructions"

17
2
85% credibility
Found May 26, 2026 at 17 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

InstructSAM is an AI system that lets you find and highlight specific objects in photos by simply describing what you want in plain language. Instead of clicking on objects manually, you type instructions like 'segment the person on the left' or 'find all the cats' and the AI automatically draws outlines around the matching objects. The system can handle multiple objects at once and works with different types of descriptions—from simple category names to complex referring expressions. It's designed for researchers and developers working on image understanding, visual AI assistants, and image editing tools.

How It Works

1
🔍 You discover a new way to find objects in photos

You come across InstructSAM, an AI that can highlight specific objects in images just by following your written instructions.

2
📥 You download the AI model

You grab the trained model from HuggingFace so your computer can understand and follow instructions about images.

3
🖼️ You show the AI a photo and give it instructions

You point to any photo, type something like 'the red car on the left' or 'all the people wearing hats', and the AI prepares to find those things.

4
The AI finds and highlights matching objects

The model analyzes your image and creates clear outlines around every object that matches your description.

🎯 You see exactly where the objects are

The results show colorful overlays on your original image, letting you see precisely which objects the AI found based on your instructions.

Sign up to see the full architecture

3 more

Sign Up Free

Star Growth

See how this repo grew from 17 to 17 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is InstructSAM?

InstructSAM is an instruction-driven segmentation framework that lets you mask arbitrary objects in images using plain text commands. Think of it as a "segment anything" tool that actually understands what you want -- you can say "segment the person on the left" or "find all the cars" and it returns precise instance masks. Built in Python on top of Qwen3-VL and SAM3, it handles three instruction styles: simple category prompts, referring expressions, and multi-step reasoning instructions. The project ships with pre-trained weights on HuggingFace, so you can run inference immediately without training.

Why is it gaining traction?

The key differentiator is flexibility -- unlike older segmentation models locked to fixed categories, InstructSAM responds to open-ended instructions. It outputs multiple instance masks in a single pass rather than requiring repeated SAM calls or agentic loops. The reasoning-style instructions are particularly interesting: you can chain logic ("segment the vehicle that is not a car") and it handles the multi-object complexity. Pre-trained weights are available, which means researchers can evaluate immediately while still having access to full training pipelines for fine-tuning.

Who should use this?

Computer vision researchers working on multimodal instruction following will find the training code useful for experiments. Developers building image annotation tools or content moderation pipelines could integrate this for flexible object extraction. It's less suited for production deployment right now given the early star count and limited community testing.

Verdict

InstructSAM solves a real problem with a clean approach, but with only 17 stars and a 0.85% credibility score, it's very early-stage software. The arXiv paper and demo videos add legitimacy, but expect rough edges. Worth exploring for research purposes or prototyping, but wait for more community validation before building critical pipelines around it.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.