gammahazard / locate-anything
PublicSleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.
LocateAnything is a web-based tool that wraps NVIDIA's LocateAnything-3B AI model, letting anyone point at an image and describe what they want to find in plain language. You upload a photo, type a description like 'dogs' or 'the stop sign,' and the tool draws glowing boxes around every match it finds. It handles object detection, phrase grounding, text finding, document layout, and UI element spotting from a single prompt box. The interface is mobile-friendly, saves every search so you can replay it later, and can run on your own NVIDIA GPU or connect to a remote one.
How It Works
Someone tells you about a tool that can find anything in a photo just by describing it in plain English.
You grab a single setup file and run one command -- everything launches automatically.
On first run the program downloads the AI brain (~6GB), showing a loading screen for a minute or two; later starts are instant.
You drag and drop any image into the browser window -- it works on your phone too, even with the camera.
You type something like 'people wearing red shirts' or 'the price tag' and pick a task type like object detection or text finding.
Quick parallel scanning, great for simple photos with few objects.
Starts fast, falls back to careful scanning when uncertain -- the default.
Thorough step-by-step analysis for dense or tricky scenes.
Bright targeting reticles appear over your image showing exactly where each thing you asked about is located, with a count and timing.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.