ni00

ni00 / NiLLM

Public

A high-performance desktop arena for developers and AI researchers to benchmark LLMs side-by-side, powered by Tauri 2 and Rust.

21
0
100% credibility
Found Feb 10, 2026 at 15 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
TypeScript
AI Summary

NiLLM is a desktop app for side-by-side benchmarking of AI language models with real-time metrics and automated judging.

How It Works

1
🖥️ Download the app

Find and install the free desktop program that lets you compare smart AI helpers.

2
🚀 Open and explore

Launch the app to see a simple dashboard with options for tests and comparisons.

3
🤖 Pick your AI helpers

Choose a few smart assistants like chatty thinkers to compare side by side.

4
Ask the same question

Type a question or pick a ready test, and watch all helpers respond live together.

5
📊 Watch speeds and judge

See charts of who thinks fastest, writes best, with auto-scores from a smart judge.

🏆 Find your winner

Pick the top AI helper for your needs and save reports of the fun comparison.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 15 to 21 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is NiLLM?

NiLLM is a high-performance desktop arena designed for developers and AI researchers to benchmark LLMs side-by-side. You load up models from OpenAI, Anthropic, OpenRouter, or custom endpoints, hit a prompt, and watch concurrent streaming responses with live metrics like TTFT, TPS, and duration—no more tab-juggling. Built in TypeScript with Tauri for native desktop speed, it includes builtin tests in English, Chinese, and Japanese, plus global controls for system prompts and generation params.

Why is it gaining traction?

It stands out with real-time side-by-side streaming and AI judging via frontier models for automated 1-5 scores, skipping manual evals. Queue multiple prompts, retry failures, rate responses, and export results—features web arenas like LMSYS Chatbot lack in a desktop package. The unified config (temp, top-p, penalties) and worker-based streaming keep it snappy even with 4+ models.

Who should use this?

AI researchers comparing LLM capabilities across providers for papers or evals. Developers selecting models for apps, like picking the fastest for chatbots or most accurate for RAG pipelines. Teams doing regular benchmarks without cloud dependencies.

Verdict

Grab it if you benchmark LLMs weekly—quick pnpm install and tauri dev gets you running. At 15 stars and 1.0% credibility, it's early (basic docs, no tests visible), so expect rough edges, but the core arena delivers immediate value over browser tools.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.