nomic-ai

AEC Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

19
0
100% credibility
Found Apr 03, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Shell
AI Summary

AEC-Bench is a benchmark with 196 real-world tasks on construction drawings, specifications, and submittals to evaluate how well AI agents understand and analyze Architecture, Engineering, and Construction documents.

How It Works

1
🔍 Discover AEC-Bench

You hear about a helpful collection of real construction drawings and documents to test if AI assistants can understand building plans like a pro.

2
📥 Gather test drawings

Download sample blueprints, specs, and submittals so you have real-world examples ready to check.

3
🤖 Link your AI helper

Connect a smart AI service like Claude or GPT so it can read and reason about the drawings.

4
đź“‹ Pick a review task

Choose something simple like checking if detail labels match what's drawn or spotting broken references.

5
▶️ Run the AI test

Watch as your AI scans the documents in a safe space and shares its findings step by step.

6
📊 See the score

Get a clear report on what your AI got right, wrong, or missed, with examples from the drawings.

🏆 Find the best AI

Now you know which AI helper shines at construction reviews, ready to use on your real projects!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is aec-bench?

AEC-bench is a multimodal benchmark for testing agentic systems on real-world architecture, engineering, and construction documents like drawings, specs, and submittals. It packs 196 tasks across 9 types in three scopes—intrasheet, intradrawing, intraproject—challenging AI agents to handle cross-reference resolution, sheet indexing, and more via vision and reasoning. Built on Python with the Harbor framework and Docker sandboxes, you prefetch PDFs from manifests, plug in API keys for Claude, Codex, or Nomic agents, and run evals through simple Harbor CLI commands like `harbor trials start` or batch jobs.

Why is it gaining traction?

Unlike generic benchmarks, this AEC challenge on GitHub targets construction-specific pain points with actual bid sets, making agent performance directly relevant to AEC tech. Harbor's sandboxed runs ensure reproducible, secure evals without local setup hassles, and results stream to JSONL for easy analysis—perfect for iterating on multimodal models. Backed by Nomic, an arXiv paper, and a Hugging Face dataset, it's a ready-to-run AEC benchmark drawing early interest from agent builders.

Who should use this?

AI researchers tuning vision-language agents for document QA in engineering and construction. AEC tech teams validating tools against real drawing sets before deployment. Multimodal system devs needing a shell-scriptable benchmark to compare models like Claude Opus or GPT variants on tasks from detail tracing to submittal reviews.

Verdict

Grab it if you're in agentic AEC—solid foundation despite 19 stars and 1.0% credibility score signaling early days. Docs are crisp, Harbor integration shines, but expect to prefetch files and tweak agents for top scores. Worth a weekend eval for multimodal progress tracking.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.