witanlabs

How we built Witan - four months of engineering an LLM spreadsheet agent

19
0
100% credibility
Found Mar 03, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
AI Summary

A detailed research log chronicling four months of experimentation in developing an AI agent for understanding, analyzing, and editing spreadsheets, including key lessons, architectural pivots, and evaluation insights.

How It Works

1
🔍 Discover the Spreadsheet AI Story

You stumble upon this GitHub page sharing a team's four-month adventure building a smart helper for reading and working with spreadsheets.

2
📖 Read the Key Takeaways

You start with the quick summary of big lessons like giving the AI room to play and think ahead before acting.

3
📚 Follow the Journey

You read through the story of tries, fails, and wins, from mapping spreadsheets to smart editing.

4
💡 Unlock the Breakthrough

You get excited learning how letting the AI tinker in an interactive playground slashed errors and sped everything up.

5
🧠 Absorb Pro Tips

You note down advice on testing rigorously and packing in real-world know-how for better results.

6
🔗 Explore More Resources

You click to the full guides and grab the handy tools they built to try yourself.

🎉 Become a Spreadsheet AI Expert

You walk away with powerful insights to create or improve AI buddies that handle spreadsheets like seasoned pros.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is research-log?

This repo is a raw engineering diary from Witan Labs, chronicling four months of trial-and-error building an LLM-powered spreadsheet agent—like an example research log detailing how built Witan, much like stories of how built the Taj Mahal or how built the pyramids in Egypt. It covers turning spreadsheets into queryable data, evolving to a REPL-based system in TypeScript and .NET for tasks like financial QnA and edits, solving the core problem of LLMs struggling with workbook structure, formulas, and navigation. Developers get battle-tested lessons, benchmarks jumping from 50% to 92% accuracy, and links to the live CLI tool for spreadsheet ops like exec, render, and calc.

Why is it gaining traction?

It stands out with unfiltered pivots—from SQL dumps and multi-agent setups to a game-changing REPL that slashes calls from 15 to 3—plus rigorous eval frameworks beating LLM judges via programmatic checks. The hook? Concrete wins like "define end state first" prompts catching errors cheap, domain skills outlasting tools, and insights mirroring how GitHub Copilot works for devs: structured reasoning over raw power. No hype, just data-driven arcs any agent builder recognizes.

Who should use this?

AI engineers crafting LLM agents for tabular data, like financial model analysts automating IRR calcs or what-if scenarios. Spreadsheet tool devs iterating on Excel/Sheets integrations, especially those wrestling navigation fails or formula tracing. Teams benchmarking agents, akin to how GitHub Actions work for CI—needing evals that expose infra bugs as "reasoning" flops.

Verdict

Worth a quick read for agent devs chasing REPL patterns or prompt rigor—19 stars and 1.0% credibility score signal early days with thin code (just a README), but docs quality shines via timelines and deep dives. Skim for evals and skills if building spreadsheet agents; fork the CLI to test real gains.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.