ant-research

[ICLR 2026] M2-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining

53
0
100% credibility
Found Apr 27, 2026 at 53 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

M2-Miner is a research tool that employs collaborative AI agents with smart search strategies to automatically generate diverse, high-quality interaction data from mobile app graphical interfaces for training navigation agents.

How It Works

1
🔍 Discover M2-Miner

You find this cool research tool on GitHub that helps create example interactions for smart phone assistants by automatically exploring apps.

2
📱 Connect Your Phone

Hook up an Android phone or test device so the tool can see and interact with its screen like a real user.

3
🚀 Start the Explorer

Kick off the multi-team of smart agents that work together to navigate apps, tap buttons, and swipe around.

4
🤖 Watch Smart Exploration

Feel excited as the agents cleverly branch out, recycle paths, and gather tons of varied ways to use apps, making everything more efficient and diverse.

5
📈 Build Your Dataset

The tool saves high-quality screenshots and action sequences from all the explorations.

🏆 Train Better Assistants

Use your new rich collection of interactions to teach AI agents that ace mobile app tasks on benchmarks.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 53 to 53 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is M2-Miner?

M2-Miner automates data mining for mobile GUI agents by using multi-agent Monte Carlo Tree Search to explore Android apps, generating diverse interaction trajectories like taps, swipes, and text inputs via ADB. It solves the bottleneck of manually creating high-quality training data for vision-language models in mobile automation, producing richer intents from simple starting points. Built in Python, it integrates with models like Qwen2.5-VL and deploys via VLLM for fast inference on real devices.

Why is it gaining traction?

This GitHub ICLR 2026 accepted paper repo stands out with its intent recycling strategy that boosts data diversity without extra human effort, leading to SOTA agent performance on GUI benchmarks. Developers dig the progressive training loop for better generalization to unseen apps, plus easy ADB scripting for screenshots and actions—no more brittle rule-based explorers. Early buzz from the ICLR 2026 OpenReview leak has researchers eyeing it for scalable data pipelines.

Who should use this?

AI researchers training mobile GUI agents who need automated trajectory collection for benchmarks like AndroidControl. Mobile devs prototyping app explorers or testers generating edge-case interactions. Agent builders short on diverse datasets for VLMs in Android environments.

Verdict

Grab it if you're in GUI agent research—solid paper-backed approach with working inference code, but 1.0% credibility score and 53 stars signal early-stage maturity; expect rough edges in docs and device setup. Worth starring for ICLR 2026 schedule watchers.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.