manugaurdl / SteerViT
PublicSteerViT is a framework that equips any ViT with the ability to steer both its global and local visual representations with natural language.
SteerViT enhances image recognition models to produce text-guided features, global summaries, and visual heatmaps from any picture.
How It Works
You stumble upon SteerViT, a clever tool that lets image AI focus exactly where your words tell it to look.
Head to the website or GitHub to see examples of images highlighting specific objects based on simple descriptions.
Click the ready-to-use notebook link to try it instantly in your web browser, no setup needed.
Choose any photo from your computer, like a street scene or family snapshot.
Describe what interests you, such as 'the red car' or 'the person's face', in everyday words.
See glowing heatmaps pinpointing the spots, plus smart summaries and detailed views of just those areas.
You've unlocked a way to make AI vision follow your instructions, perfect for exploring photos deeply or building fun projects.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.