vra

Unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives.

48
4
100% credibility
Found May 13, 2026 at 48 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

PyTorch reproduction of a research method training AI models to reason visually using bounding boxes and points for tasks like object location, counting, and path tracing.

How It Works

1
🔍 Discover smart visual AI

You find a project that teaches AI to 'think' by drawing boxes around objects and pointing paths in pictures, making it great for spotting cats or solving mazes.

2
💻 Get it ready on your computer

Download the simple tools and set everything up quickly so your AI can start learning.

3
📸 Gather everyday pictures

Collect photos of objects, people, or fun puzzles like mazes to feed into the lessons.

4
🎓 Teach it to spot and box

First big lesson: show the AI how to outline objects precisely in your pictures.

5
Pick special skills
📦
Box master

Perfect for finding and counting items like balls or people.

📍
Path tracer

Great for following lines, mazes, or paths.

6
🔗 Blend into one super AI

Combine both skills into a single assistant that handles everything smoothly.

AI now sees like magic!

Your assistant draws perfect boxes, counts objects, solves puzzles – ready to wow with any picture!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 48 to 48 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Thinking-with-Visual-Primitives-pytorch?

This GitHub repo is an unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives, a pipeline that trains vision-language models to embed bounding boxes and points directly into chain-of-thought outputs for precise visual reasoning. It closes the "reference gap" in tasks like object grounding, counting, maze navigation, and path tracing, turning vague descriptions into structured coordinates. Python-based with PyTorch, users get one-command data generation, multi-stage training (pretrain, SFT experts, distillation), and CLI inference on images or JSONL batches.

Why is it gaining traction?

It democratizes a cutting-edge paper with 12GB GPU configs, bilingual READMEs, and scripts for visual comparisons across epochs—far more accessible than opaque official repos. Devs dig the end-to-end reproducibility, from COCO-derived data to eval metrics, plus quantization for low-VRAM runs. As an unofficial PyTorch take on DeepSeek's primitives, it hooks experimenters tired of black-box VLMs.

Who should use this?

Vision-language researchers reproducing multimodal papers, agent builders needing spatial reasoning (e.g., robotics pathfinding or AR overlays), and PyTorch tinkerers fine-tuning Qwen-VL bases for grounding tasks. Ideal for teams evaluating unofficial PyTorch optimizations like training loops on custom primitives.

Verdict

Grab it for a battle-tested reproduction pipeline with strong docs and MIT license, despite 48 stars and 1.0% credibility signaling early maturity—no tests yet, but visualization/eval scripts fill the gap. Fork if DeepSeek's thinking primitives intrigue you.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.