THUSI-Lab / GameVerse

Public

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

gameverse-bench.github.io benchmark computer-vision game-agent vision-language-model

100% credibility

Found Mar 12, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

GameVerse is a benchmark framework for testing AI agents' ability to play diverse video games using vision-language models, with features for video analysis and performance evaluation.

How It Works

🔍 Discover GameVerse

You hear about GameVerse, a fun way to see if smart AI can play your favorite video games like a real player.

🎮 Get your games ready

Pick some classic games like Angry Birds or Snake that you enjoy, and make sure they're installed on your computer.

🧠 Connect a smart AI helper

Link up a clever AI brain from a service like ChatGPT so it can see and think about the game screens.

🚀 Watch AI play!

Hit play and see the AI take control, moving pieces, shooting birds, or navigating mazes just like you would.

📹 Review the gameplay video

Watch a smooth video replay of what happened, spotting wins, mistakes, and clever moves.

💡 Get helpful insights

The system compares your AI's play to expert videos and creates tips to play smarter next time.

🏆 AI gets better at games

Your AI improves over time, climbing leaderboards and mastering tough levels with growing smarts.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 19 to 21 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is GameVerse?

GameVerse is a Python benchmark that lets you evaluate vision-language models on 15+ complex games like Angry Birds, Baba Is You, Civilization, and Slay the Spire. It tests if models can play via GUI clicks or semantic commands while learning from video-based reflection—comparing agent failures against expert playthroughs to generate insights. Users get reproducible evals via simple CLI scripts like `play_game.py` or leaderboard runners, plus tools to extract milestones and reflections from videos.

Why is it gaining traction?

It stands out with plug-and-play support for top VLMs (GPT-4o, Gemini, Qwen-VL) across real-time and turn-based games, including video reflection that boosts agent performance without retraining. Devs love the one-command leaderboards and easy extension to new titles, skipping the hassle of custom wrappers. Early GameVerse 2025 experiments show reflection helping models like Qwen-VL-32B navigate mazes or merge tiles in 2048.

Who should use this?

AI researchers benchmarking VLMs on games, especially those exploring reflection for better decision-making in dynamic environments. Game AI devs tuning agents for titles like Genshin or Forza Horizon 5. Python scripters prototyping video-based learning without building envs from scratch.

Verdict

Worth forking for VLM game research—solid docs, MIT license, and arXiv paper make it accessible despite 19 stars and 1.0% credibility signaling early maturity. Test a game today; extend if reflection hooks you.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 21 stars

Penalty: New account (10d): -70%

Penalty: New account with popular repo: -90%

Bonus: AI verified quality (100%)

Account age: 10 days

Repo age: 10 days

License: MIT

Updated: Mar 13, 2026