AMAP-ML

AMAP-ML / Code2World

Public

Code2World: A GUI World Model via Renderable Code Generation

195
7
100% credibility
Found Feb 11, 2026 at 133 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Code2World is a vision-language model that simulates future Android GUI states by generating renderable HTML code to enhance autonomous agents.

How It Works

1
๐Ÿ” Discover Code2World

You find this clever tool while exploring ways to make phone-controlling assistants smarter and more reliable.

2
๐Ÿ’ก Understand the magic

It learns to guess what happens next on a phone screen after any tap or swipe, like having a crystal ball for apps.

3
๐Ÿง  Get inspired by results

See how it boosts assistant success rates by up to 9.5% on real phone tasks, rivaling top AI models.

4
๐Ÿ› ๏ธ Set up your playground

Create a cozy space on your computer with a virtual phone ready for experiments.

5
๐Ÿ“ฅ Bring in the smarts

Download the pre-trained brains that power the predictions.

6
๐Ÿš€ Launch a test run

Pick a sample phone challenge and watch your upgraded assistant tackle it step by step.

๐ŸŽ‰ Smarter agents unlocked

Your assistant now navigates apps with foresight, completing tasks faster and more accurately than before.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 133 to 195 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Code2World?

Code2World is a Python-based vision-language model that predicts the next GUI state in Android apps by generating renderable code, like HTML, from a current screenshot and action. It solves the foresight gap for autonomous agents by simulating dynamic UI transitions with high visual fidelity and precise control, sidestepping blurry pixel predictions or rigid text outputs. Developers get a world model that renders realistic next screens, downloadable from Hugging Face as an 8B checkpoint.

Why is it gaining traction?

Unlike pixel diffusion models that lack structure or text predictors missing visuals, Code2World delivers both via code generation, topping benchmarks on Android Control and GUI Odyssey while rivaling GPT-5-level performance. The hook is its plug-and-play boost to agents: pair it with Gemini-2.5-Flash via simple config tweaks, and navigation success jumps 9.5% on AndroidWorld suites. Early adopters praise the AndroidCode dataset for scalable training data.

Who should use this?

AI researchers building GUI agents for mobile automation, like app testing or accessibility tools. Android devs simulating user flows without emulators, or agent trainers needing cheap transition rollouts. Skip if you're not in embodied AIโ€”it's tailored for Python VLM pipelines with Android envs.

Verdict

Worth forking for GUI agent prototypes if you're in the space; pretrained weights and arXiv paper make onboarding fast despite 83 stars and 1.0% credibility signaling early maturity. Test on your AndroidWorld setup firstโ€”docs are solid, but expect tweaks for production.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.