agent-sh

Linux desktop control over MCP — AT-SPI, GNOME Shell, Wayland portals, ydotool

14
0
80% credibility
Found May 19, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

computer-use-linux is a bridge that lets AI assistants control Linux desktop computers. It works by connecting to the operating system's accessibility features to read what's on screen, and to input controls to click buttons, scroll pages, and type text. The tool supports multiple Linux desktop environments (GNOME, KDE, Hyprland, i3, COSMIC) and both Wayland and X11 display systems. It includes safety features like a system health check, clear labels on which actions can change your computer, and no network connectivity. Users install it once, run a setup check, and then their AI assistant can see and interact with their desktop just like a human user would.

How It Works

1
🤖 You want an AI that can use your desktop

You've been using AI assistants that can help with text, but you want one that can actually click buttons, read screens, and automate your Linux desktop just like you would.

2
📦 You install the desktop control tool

You download and install computer-use-linux, which sets up a bridge between your AI assistant and your Linux desktop. The installer checks that everything your computer needs is ready.

3
🔍 You run a health check

You run the 'doctor' command which checks your system like a medical checkup. It verifies your screen sharing permissions, accessibility features, and input controls are all working properly.

4
Your desktop gets configured
🦆
GNOME desktop

A small browser extension gets installed so the AI can see and control your windows precisely.

🐉
KDE, COSMIC, Hyprland, or i3

Your desktop's built-in window controls get connected so the AI can read and manage your windows.

5
🔌 You connect it to your AI assistant

You tell your AI assistant (like Claude Desktop or Hermes Agent) to use this new tool. Now your AI can see your desktop, click buttons, read text, and type into applications.

6
Your AI starts working on your desktop

You ask your AI to help you with a task, and it opens applications, reads the screen to understand what's there, clicks the right buttons, and types text exactly where you need it.

🎉 Your AI works alongside you

Your AI assistant can now help you automate repetitive clicks, fill out forms, test applications, or handle any desktop task you show it how to do.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 14 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is computer-use-linux?

This is a Rust-based MCP server that gives AI agents full control over Linux desktops. Think of it as the missing piece that lets Claude, Codex, or any MCP-compatible agent actually see and interact with your Linux UI instead of just reading files. It reads accessibility trees via AT-SPI, takes screenshots through Wayland portals, and synthesizes input through ydotool or the XDG remote desktop portal. The toolset includes 15 MCP tools: window listing, element-aware clicking, text input, keyboard shortcuts, scroll, and semantic actions like "click the OK button" rather than guessing pixel coordinates. A built-in `doctor` command outputs a structured JSON readiness report showing exactly what's working on your system.

Why is it gaining traction?

Most computer-use tooling is macOS-only. The few Linux alternatives either rely on fragile X11 root window hacks or skip input entirely. This project actually works on Wayland, which is where modern Linux desktops live. The multi-compositor backend registry is the key differentiator: it tries GNOME Shell extension, GNOME Introspect, COSMIC helper, KWin scripting, Hyprland hyprctl, and i3 IPC in sequence, reporting which one succeeded and why others failed. Semantic selectors backed by AT-SPI mean agents can target "the submit button in the login form" instead of hardcoding coordinates that break on resize. The safety annotations on every tool distinguish read-only observation from destructive desktop actions.

Who should use this?

Linux developers building AI agents that need to interact with GUI applications. DevOps engineers automating desktop workflows. Researchers testing how AI models handle real Linux environments. It's specifically for people running GNOME, KDE, Hyprland, i3, or COSMIC on a modern Linux distribution. If you're on macOS or Windows, look elsewhere. If you're running XFCE or Sway without a supported backend, you'll get degraded functionality.

Verdict

This fills a real gap in the Linux AI tooling landscape, and the Wayland-first approach with compositor-aware window targeting is genuinely useful. However, with 14 stars and a credibility score of 0.800000011920929%, this is early-stage software from a solo maintainer. The documentation is thorough and the architecture is sound, but real-world stability across distros and desktop environments remains unproven. Install it, run `computer-use-linux doctor`, and evaluate whether it works on your specific setup before committing to it in production.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.