stanford-iris-lab

Reference code for the Meta-Harness paper.

81
4
100% credibility
Found Apr 15, 2026 at 172 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Meta-Harness is a research framework for automatically optimizing the supporting code around AI models to improve performance on specific tasks like terminal operations and text classification.

How It Works

1
🔍 Discover Meta-Harness

You hear about this helpful tool from Stanford researchers that automatically improves how AI helpers tackle specific jobs, like sorting text or using a computer terminal.

2
📥 Get it set up

You download the project and prepare it on your computer following simple instructions, ready to try right away.

3
🧪 Try a sample task

You pick an example like classifying text messages or testing terminal commands, and run it to see the AI in action.

4
🚀 Start improving

You launch the magic process where the tool suggests and tests better ways for the AI to work, watching it get smarter over a few rounds.

5
📊 Check the progress

You review the reports showing how much better the AI performs on tasks after each improvement.

🎉 Optimized AI ready

Your AI helper is now sharper and more reliable for your chosen tasks, saving time and boosting results effortlessly.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 172 to 81 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is meta-harness?

Meta-Harness is Python reference code for the Stanford IRIS lab's arXiv paper on end-to-end optimization of model harnesses—the surrounding code that manages storage, retrieval, and display for a fixed LLM base model. It automates evolution of task-specific setups like memory systems for text classification or agent scaffolds for terminal tasks, using LLMs to propose candidates and Harbor benchmarks to score them. Developers get a reusable framework with quickstarts via uv sync and bash scripts, plus an onboarding flow to adapt it to new domains.

Why is it gaining traction?

Unlike static prompt engineering, Meta-Harness treats harnesses as evolvable artifacts, running full end-to-end searches with cheap smoke tests before expensive evals—cutting manual iteration. GitHub PR reference issues and reference code examples make it dead simple to repro paper results or fork for custom tasks, hooking devs who want automated gains over baselines like few-shot or no-memory setups.

Who should use this?

AI engineers tuning LLMs for agentic workflows, like terminal automation on Terminal-Bench or retrieval-augmented classification on datasets like Symptom2Disease. Ideal for researchers replicating the meta-harness paper or teams optimizing harnesses end-to-end without hand-coding every prompt variant.

Verdict

Grab it if you're experimenting with the method—81 stars and cleaned-up paper code mean it's raw but runnable with subdir READMEs guiding setup. 1.0% credibility score flags low maturity; test thoroughly before prod, but it's a solid reference GitHub action for harness optimization.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.