vra / Thinking-with-Visual-Primitives-pytorch

Public

Unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives.

deepseek llms opd pytorch

100% credibility

Found May 13, 2026 at 48 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

PyTorch reproduction of a research method training AI models to reason visually using bounding boxes and points for tasks like object location, counting, and path tracing.

How It Works

🔍 Discover smart visual AI

You find a project that teaches AI to 'think' by drawing boxes around objects and pointing paths in pictures, making it great for spotting cats or solving mazes.

💻 Get it ready on your computer

Download the simple tools and set everything up quickly so your AI can start learning.

📸 Gather everyday pictures

Collect photos of objects, people, or fun puzzles like mazes to feed into the lessons.

🎓 Teach it to spot and box

First big lesson: show the AI how to outline objects precisely in your pictures.

Pick special skills

📦

Box master

Perfect for finding and counting items like balls or people.

📍

Path tracer

Great for following lines, mazes, or paths.

🔗 Blend into one super AI

Combine both skills into a single assistant that handles everything smoothly.

✨ AI now sees like magic!

Your assistant draws perfect boxes, counts objects, solves puzzles – ready to wow with any picture!

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 48 to 48 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is Thinking-with-Visual-Primitives-pytorch?

This GitHub repo is an unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives, a pipeline that trains vision-language models to embed bounding boxes and points directly into chain-of-thought outputs for precise visual reasoning. It closes the "reference gap" in tasks like object grounding, counting, maze navigation, and path tracing, turning vague descriptions into structured coordinates. Python-based with PyTorch, users get one-command data generation, multi-stage training (pretrain, SFT experts, distillation), and CLI inference on images or JSONL batches.

Why is it gaining traction?

It democratizes a cutting-edge paper with 12GB GPU configs, bilingual READMEs, and scripts for visual comparisons across epochs—far more accessible than opaque official repos. Devs dig the end-to-end reproducibility, from COCO-derived data to eval metrics, plus quantization for low-VRAM runs. As an unofficial PyTorch take on DeepSeek's primitives, it hooks experimenters tired of black-box VLMs.

Who should use this?

Vision-language researchers reproducing multimodal papers, agent builders needing spatial reasoning (e.g., robotics pathfinding or AR overlays), and PyTorch tinkerers fine-tuning Qwen-VL bases for grounding tasks. Ideal for teams evaluating unofficial PyTorch optimizations like training loops on custom primitives.

Verdict

Grab it for a battle-tested reproduction pipeline with strong docs and MIT license, despite 48 stars and 1.0% credibility signaling early maturity—no tests yet, but visualization/eval scripts fill the gap. Fork if DeepSeek's thinking primitives intrigue you.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

203

Followers

Base stars: 48 stars

Bonus: AI verified quality (100%)

Account age: 4,610 days

Repo age: 13 days

Updated: May 13, 2026