grigio

Benchmark system for testing opencode with various LLM models, measuring speed (latency) and correctness (accuracy).

11
0
100% credibility
Found Mar 05, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
TypeScript
AI Summary

A local dashboard tool for testing various AI language models on custom prompts to measure their response speed and answer accuracy.

How It Works

1
🔍 Discover the benchmark tool

You hear about a handy tool that lets you compare different AI assistants on how fast and accurate they are at solving real tasks.

2
📝 Prepare your test questions

You create a folder with simple task descriptions and their correct answers, like coding challenges or quick questions.

3
🤖 Test with your favorite AI

You choose one AI assistant and let it try answering all your test questions, noting how long it takes.

4
Judge the answers

Another smart AI reviews each answer to score how correct it is, giving you reliable pass or fail marks.

5
📊 Open the results dashboard

With one click, a colorful webpage opens on your computer showing charts, heatmaps, and comparisons.

6
🔄 Repeat for other AIs

You run the same tests on different AI assistants to see which ones shine in speed and smarts.

🏆 Find your perfect AI match

You easily spot the best AI for your needs, balancing quick responses with spot-on accuracy, ready to use in your projects.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is opencode-benchmark-dashboard?

This TypeScript tool, built on Bun, lets you benchmark OpenAI-compatible LLM models via the opencode CLI against your custom prompts, tracking latency and accuracy on coding or extraction tasks. Drop prompts and expected answers into directories, run CLI commands like `bun run answer -m model` to generate outputs, `bun run evaluate -m model` for LLM-based scoring, then fire up `bun run dashboard` for a local web view at localhost:3000. It solves the real-world tradeoff hunt: finding models that balance speed and correctness on your hardware without cloud dependencies.

Why is it gaining traction?

Unlike generic benchmark github actions or online tools, it runs locally as a benchmark system performance tester tailored for opencode, with a dashboard featuring scatter plots for accuracy vs latency and clickable heatmaps drilling into outputs. Developers dig the quick iteration—no setup beyond opencode config—and LLM verification that handles semantic matches beyond exact strings. It's a lightweight benchmark test github alternative for comparing quantizations or tools like github copilot rivals.

Who should use this?

AI engineers tuning local LLMs for coding benchmarks on Linux or PC setups, especially those evaluating benchmark system info like tok/s and reasoning overhead. Ops folks testing benchmark github gpu acceleration for deployment, or indie devs needing a benchmark system online equivalent without vendor lock-in.

Verdict

Grab it for fast, local LLM benchmarking if you're already on opencode—dashboard visuals make tradeoffs obvious. With 10 stars and 1.0% credibility score, it's early-stage (solid tests, clear docs) but immature for production; fork and extend for custom needs.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.