ZJU-REAL

Official code for "KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation"

46
0
100% credibility
Found Apr 10, 2026 at 46 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

KnowU-Bench is an interactive benchmark for evaluating AI agents that perform personalized and proactive tasks on simulated Android phones.

How It Works

1
๐Ÿ” Discover KnowU-Bench

You find this benchmark on GitHub or a research paper and get excited to test how well AI assistants understand personal phone habits.

2
๐Ÿ› ๏ธ Set up your computer

You install a few simple tools so your computer can run virtual Android phones.

3
๐Ÿ“ฑ Start virtual phones

With one command, you launch several realistic Android phones ready for testing.

4
๐Ÿค– Choose your AI helper

You pick an AI assistant and select tasks like daily routines or personal preferences to evaluate.

5
โšก Watch the magic happen

Your AI takes over the phones, making decisions based on user habits, asking questions when needed, and handling real interactions.

6
๐Ÿ“Š Review the results

You open a web viewer to see step-by-step actions, screenshots, scores, and detailed reports.

๐ŸŽ‰ Understand your AI

You discover exactly how well your assistant knows you, spots weaknesses, and gets ideas for improvements.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 46 to 46 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is KnowU-Bench?

KnowU-Bench is the official GitHub repository for a Python benchmark evaluating interactive, proactive, and personalized mobile agents in Dockerized Android emulators. It tests agents on inferring user preferences from behavioral logs, multi-turn clarifications via LLM simulators, and proactive decisions like intervening or seeking consent across 192 tasks in 23 apps. Developers get reproducible evals with CLI commands like `mw env run` for environments and `mw eval` for agent runs.

Why is it gaining traction?

Unlike standard GUI benches focused on explicit tasks, KnowU-Bench measures gaps in user understanding and trustworthy proactivity, using hidden profiles and online simulators. The official GitHub CLI streamlines setup, log viewing (`mw logs view`), and metrics calc, while built-in agents and RAG user-log injection let you benchmark custom models fast. Its official GitHub page and releases track progress on real mobile agent pain points.

Who should use this?

AI researchers tuning LLMs for mobile assistants, like proactive habit trackers or personalized schedulers. Suited for teams evaluating agents on Android apps with personas (developer, student, grandma), especially those integrating official GitHub MCP servers for routine tasks.

Verdict

Promising niche bench for mobile agent evaluation, but 46 stars and 1.0% credibility signal early maturityโ€”docs are strong, yet expect tweaks. Grab it from the official GitHub releases page if you're building proactive, personalized mobile AI.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.