Smyan1909

Smyan1909 / SoMatic

Public

Agent-first CLI for native UI automation with Set-of-Marks screenshots. MCP server + headless Xvfb support included.

12
1
85% credibility
Found May 22, 2026 at 13 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SoMatic is a desktop automation tool that lets AI assistants control your computer by seeing and interacting with the screen. It takes screenshots, uses a local AI model to detect and number every clickable element (buttons, text fields, menus), and draws visual boxes around them. AI assistants can then tell SoMatic to 'click button 5' or 'type in the search box,' and it executes those actions. The tool works across native desktop apps, browsers, PDFs, and other applications. It includes a vision system for AI detection, a headless mode for running automation on virtual displays, and integrates with AI coding assistants through the MCP protocol.

How It Works

1
📦 Install SoMatic

You download and install SoMatic on your computer with a simple command.

2
🔍 Check your setup

You run a quick readiness check to make sure everything is configured correctly for your operating system.

3
👁️ Start the vision assistant

You launch the built-in vision system that can see and understand your screen.

4
📸 Capture your screen

You take a screenshot, and SoMatic automatically draws red numbered boxes around every clickable element it finds.

5
🤖 Your AI assistant sees the marks

The AI assistant looks at your annotated screenshot and knows exactly which elements are buttons, text fields, and other controls.

6
Tell your assistant what to do
🖱️
Click by number

Assistant clicks on element #7 because that's the submit button you wanted

⌨️
Type or press keys

Assistant types your message or presses keyboard shortcuts

📍
Click near something

Assistant clicks next to a detected element when the target is just beside it

Task complete

Your AI assistant successfully completed the task by controlling your desktop applications just like a human would.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 13 to 12 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is SoMatic?

SoMatic is a Python CLI that automates desktop UI by turning screenshots into numbered action targets. It runs a local YOLO model to detect every interactive element and overlay numbers on them, so agents can click "button 5" instead of guessing coordinates. Every command returns JSON, making it pipeline-friendly. It includes an MCP server for integration with AI coding tools, and a Linux headless mode using Xvfb for sandboxed automation.

Why is it gaining traction?

The Set-of-Marks approach solves the grounding problem that plagues generalist AI agents — instead of asking a VLM to navigate a raw screenshot and hope for the best, you give it numbered targets. Their benchmarks show this matters: SoMatic detection + GPT-5.5 hit 68-78% accuracy versus 52-59% for raw GPT on established benchmarks. The MIT/AGPL dual-license strategy is also clever — the core stays permissively licensed while the YOLO weights respect upstream licensing. For developers building agentic workflows, this fills a real gap between "raw computer vision" and fragile coordinate-based automation.

Who should use this?

Frontend and QA engineers building automated UI test suites will get the most value — especially those working on dense applications like spreadsheets or CAD tools where simple image-matching falls apart. Developers integrating AI coding assistants (Claude Code, Cursor, Continue) into desktop workflows should evaluate the MCP server. Teams needing CI-scale headless UI automation on Linux will appreciate the Xvfb integration. It's less suited for simple browser-only tasks where established tools like Playwright already excel.

Verdict

SoMatic addresses a legitimate problem with a clean solution, and the benchmark numbers lend credibility to the approach. The 0.8500000238418579% credibility score reflects solid engineering discipline. However, at 12 stars with alpha status and no visible test coverage, treat it as promising rather than production-ready. Worth watching closely, especially if the MCP integration improves agent reliability on complex desktop tasks.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.