WeiminXiong / Video2GUI
PublicVideo2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining (ICML2026)
This is a research project from an academic author studying how to train AI assistants to use software programs by watching human interactions. The project has a paper describing a system called Video2GUI that learns from video demonstrations of people clicking and navigating through applications, creating a large collection of examples called the WildGUI dataset. The actual implementation code is minimal in this repository, which appears to be a placeholder or early release, with only basic interface components included. The full dataset and implementation were mentioned to be released later.
How It Works
A researcher finds this project through an academic paper or search, interested in teaching AI to interact with computer interfaces.
The researcher learns this project aims to help train AI assistants to use software programs by watching how people interact with them.
The researcher downloads the code to see how the system works under the hood.
The researcher decides to wait until the WildGUI dataset is released to use it for training their own AI models.
The researcher uses the trained model to make an AI assistant that can automatically navigate through software programs and websites.
The AI studies thousands of recorded interactions showing how humans click buttons, type text, and navigate through different applications.
The AI assistant can now perform tasks on its own by understanding how to interact with different graphical interfaces.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.