confluence-labs

State-of-the-art ARC-AGI-2 solver by Confluence Labs

88
9
100% credibility
Found Feb 25, 2026 at 80 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Open-source project reproducing a state-of-the-art 97.92% score on the ARC-AGI-2 AI reasoning benchmark using teams of AI agents in secure workspaces.

How It Works

1
🔍 Discover the puzzle solver

You stumble upon this exciting project that uses smart AI helpers to crack tough brain-teaser puzzles from the ARC-AGI challenge.

2
📝 Sign up for helpers

You create free accounts with a smart AI service and a secure cloud workspace provider to power your solver.

3
🔗 Link your services

You add simple private access codes so your AI helpers can think and work safely in protected spaces.

4
⚙️ Prepare your setup

You download the project files and get everything ready on your computer with a quick preparation step.

5
▶️ Launch the solver

With one command, you start a team of AI agents tackling all the puzzles at once—it feels magical as they collaborate.

6
Watch progress unfold

You monitor live updates as agents refine solutions over loops, building confidence with each update.

7
📊 Check results and costs

The tool wraps up, shows your puzzle-solving score, total spending, and confirms everything stayed secure.

🎉 Achieve top scores

You celebrate cracking 97.92% of the public puzzles, ready to submit for the ARC Prize with pride!

Sign up to see the full architecture

6 more

Sign Up Free

Star Growth

See how this repo grew from 80 to 88 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is arc-agi-2?

This Python solver from Confluence Labs tackles the ARC-AGI-2 benchmark, a tough test of abstraction and reasoning for AGI progress, like solving novel arc-agi-2 questions and tests from the arc-agi-2 paper. It spins up parallel Gemini agents in secure E2B sandboxes to generate and refine Python code that transforms input grids into outputs, hitting 97.92% on the public ARC-AGI-2 evaluation set. Users get a one-command run via bash script—set Gemini and E2B API keys, fire it up for submission.json, scores, and cost breakdowns.

Why is it gaining traction?

It saturates the arc-agi-2 benchmark public set, outpacing most on the arc-agi-2 leaderboard with configurable agent counts (up to 12 per input), iteration loops (10 max), and high concurrency (132 sandboxes). Unlike pure RL or custom models, it leverages Gemini 3 preview via CLI for state-of-the-art results without training, plus resume support and partial-result safety nets. Devs dig the transparency: per-task costs, token usage, and readable logs for dissecting agent reasoning.

Who should use this?

AI researchers chasing arc-agi-2 leaderboard spots or arc-agi-2025 evals, especially those benchmarking LLMs on ARC tasks. Experimenters building agentic workflows for grid-based puzzles or poetique-style reasoning. Teams at confluence labs-style outfits prototyping AGI solvers without infra headaches.

Verdict

Grab it if you're deep in ARC-AGI-2—delivers state-of-the-art github results out of the box on public data. With 76 stars and 1.0% credibility score, it's early-stage (solid README, no tests), so validate on privates before production.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.