Agent-first CLI for native UI automation with Set-of-Marks screenshots. MCP server + headless Xvfb support included.
SoMatic is a desktop automation tool that lets AI assistants control your computer by seeing and interacting with the screen. It takes screenshots, uses a local AI model to detect and number every clickable element (buttons, text fields, menus), and draws visual boxes around them. AI assistants can then tell SoMatic to 'click button 5' or 'type in the search box,' and it executes those actions. The tool works across native desktop apps, browsers, PDFs, and other applications. It includes a vision system for AI detection, a headless mode for running automation on virtual displays, and integrates with AI coding assistants through the MCP protocol.
How It Works
You download and install SoMatic on your computer with a simple command.
You run a quick readiness check to make sure everything is configured correctly for your operating system.
You launch the built-in vision system that can see and understand your screen.
You take a screenshot, and SoMatic automatically draws red numbered boxes around every clickable element it finds.
The AI assistant looks at your annotated screenshot and knows exactly which elements are buttons, text fields, and other controls.
Assistant clicks on element #7 because that's the submit button you wanted
Assistant types your message or presses keyboard shortcuts
Assistant clicks next to a detected element when the target is just beside it
Your AI assistant successfully completed the task by controlling your desktop applications just like a human would.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.