Osilly / Vision-DeepResearch
PublicMultimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine interactions to hundreds.
Research project offering datasets, pre-trained models, and code for training multimodal LLMs specialized in deep visual and textual research tasks, including a new benchmark VDR-Bench.
How It Works
You stumble upon this project while looking for tools to help AI handle tough image and text research tasks.
Videos and charts show models cracking complex visual searches that stump others.
Grab free training data and benchmarks from Hugging Face to fuel your experiments.
Follow easy guides to prepare everything for training your own model.
Run simple commands to teach your AI the basics with supervised fine-tuning.
Add reinforcement learning to make your model a deep research expert.
Run evaluations to measure how well your model performs on real challenges.
Celebrate as your multimodal model excels at visual and textual deep dives.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.