gammahazard / locate-anything

Public

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

bounding-boxes computer-vision cuda docker fastapi

89% credibility

Found May 31, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

TypeScript

AI Summary

LocateAnything is a web-based tool that wraps NVIDIA's LocateAnything-3B AI model, letting anyone point at an image and describe what they want to find in plain language. You upload a photo, type a description like 'dogs' or 'the stop sign,' and the tool draws glowing boxes around every match it finds. It handles object detection, phrase grounding, text finding, document layout, and UI element spotting from a single prompt box. The interface is mobile-friendly, saves every search so you can replay it later, and can run on your own NVIDIA GPU or connect to a remote one.

How It Works

👀 You hear about it

Someone tells you about a tool that can find anything in a photo just by describing it in plain English.

📤 You download and start it

You grab a single setup file and run one command -- everything launches automatically.

⏳ You wait for the AI to load

On first run the program downloads the AI brain (~6GB), showing a loading screen for a minute or two; later starts are instant.

🔍 You point it at a photo

You drag and drop any image into the browser window -- it works on your phone too, even with the camera.

💬 You describe what to find

You type something like 'people wearing red shirts' or 'the price tag' and pick a task type like object detection or text finding.

You choose how thorough it should be

⚡

Fast mode

Quick parallel scanning, great for simple photos with few objects.

🔄

Hybrid mode

Starts fast, falls back to careful scanning when uncertain -- the default.

🎯

Slow mode

Thorough step-by-step analysis for dense or tricky scenes.

🌟 You see glowing boxes around everything it found

Bright targeting reticles appear over your image showing exactly where each thing you asked about is located, with a count and timing.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is locate-anything?

Locate-anything is a web interface wrapping NVIDIA's LocateAnything-3B model for open-vocabulary object detection. Drop in an image, type what you want to find in plain language, and get bounding boxes back. It handles object detection, phrase grounding, text detection, document layout analysis, GUI element localization, and pointing tasks from a single prompt input. The stack is Python on the backend with FastAPI, TypeScript and React on the frontend, and deploys via Docker Compose. A mock mode lets you explore the full interface without GPU hardware.

Why is it gaining traction?

The one-command setup is the hook: `docker compose up` gets you a working UI with no manual model loading or environment configuration. The mobile-friendly design and camera input support make it practical for field use. Speed/quality toggles let users trade accuracy for latency depending on the scene complexity. Search history persists every detection with images and results, so past queries are re-runnable. The GPU preflight check tells you upfront whether your hardware is compatible, avoiding frustrating runtime failures.

Who should use this?

Researchers working with visual grounding datasets who need a quick way to test queries without writing inference code. Developers building demos or prototypes that require object localization by natural language description. Teams evaluating LocateAnything-3B for academic or non-commercial projects who want a shareable interface rather than a CLI. Not suitable for commercial products due to the model's non-commercial license restriction.

Verdict

At 10 stars with minimal test coverage, this is an early-stage project that delivers on its core promise but lacks the maturity for production reliance. The 0.9% credibility score reflects that reality. If you have an NVIDIA Ampere-or-newer GPU and need open-vocabulary detection for research, the friction-free setup makes it worth trying. Just verify the non-commercial license fits your use case before investing time.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 10 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (90%)

Account age: 1,683 days

Repo age: 3 days

License: Apache-2.0

Updated: May 31, 2026