zlab-princeton / VisionFoundry
PublicVisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
VisionFoundry is an open-source toolkit for generating synthetic image-question-answer datasets to improve visual perception in AI models, with scripts for fine-tuning popular vision-language models.
How It Works
You find this helpful tool from Princeton researchers while looking for ways to create custom image datasets for AI vision training.
You install the easy starter tools and link your favorite AI image makers so they can help create pictures.
You type a simple description of the visual task, like spotting colors or positions in scenes, and choose how many examples to make.
You hit go, and it automatically dreams up detailed scenes, generates realistic images, smart questions, and perfect answers—all verified to match perfectly.
You get a ready-to-use folder of images paired with questions and answers, plus details on scenes and styles used.
You follow the friendly guides to teach an AI model like Llama or Qwen using your new dataset, watching it learn to see and answer.
Your custom-trained model now excels at understanding images, ready for your projects or research—with a published dataset to share too!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.