gdevenyi / huggingface-estimate

Public

A standalone memory usage calculator for huggingface gguf models

gabriel.devenyi.cahuggingface-estimate estimate huggingface llama-cpp llamacpp llm

100% credibility

Found Apr 24, 2026 at 14 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

JavaScript

AI Summary

A browser-based tool that estimates graphics card memory, system RAM, and performance needs for AI language models hosted online, without downloading the files.

How It Works

🧠 Discover a cool AI model

You hear about an exciting AI language model online and want to run it on your own computer.

❓ Wonder about your hardware

You check your computer's graphics card and memory, unsure if it's powerful enough for the model.

🔍 Find the memory estimator

You search and discover this free tool with a live demo to check model needs instantly.

⚡ Enter the model name

Type the model's sharing site address, and it pulls details without downloading the giant file.

⚙️ Customize your setup

Select your graphics card from presets, set chat length, and tweak memory options.

📊 View the full breakdown

See exact graphics memory, computer memory use, speed predictions, and if it fits.

✅ Ready to run confidently

You know precisely what to expect and can download and launch the model with peace of mind.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 14 to 14 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is huggingface-estimate?

This JavaScript calculator estimates VRAM and RAM usage for HuggingFace GGUF models without downloading them—enter a repo path, and it fetches metadata via HTTP Range requests to break down weights, KV cache, activations, and multimodal projectors. Tweak context length, batch size, or KV quant like Q8_0, and get exact GiB figures matching llama.cpp semantics, including MoE expert spilling. Run it in-browser via a live demo or CLI with `node run-calc.js repo --ctx 8192 --vram 24`.

Why is it gaining traction?

Unlike huggingface accelerate estimate memory tools that require Python or full downloads, this standalone GitHub app handles 69 architectures, 106 quants across llama.cpp forks, and performance bounds (tok/s, TTFT) using 100+ GPU/CPU presets—no install hassles. MoE support mirrors llama.cpp's `--cpu-moe` hybrid layers precisely, spotting bottlenecks like compute vs bandwidth before you quantize.

Who should use this?

LLM deployers sizing GGUF models for edge hardware, like checking if Qwen3-30B-A3B fits a 24GB card with 8192 ctx. Quant experimenters comparing embedded vs standalone memory across IQ4_NL vs TURBO4_0 without GB-scale trials. Local inference tinkerers evaluating whisper standalone github or push 3 standalone memory setups.

Verdict

Grab it for dead-accurate GGUF sizing—browser demo alone saves hours. At 14 stars and 1.0% credibility, it's niche and unproven at scale, but thorough docs and CLI make it reliable today.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

160

Followers

Base stars: 14 stars

Bonus: AI verified quality (100%)

Account age: 4,883 days

Repo age: 6 days

Updated: Apr 23, 2026