harishsg993010 / tiny-NPU

Public

opensource NPU for LLM inference (this run gpt2)

100% credibility

Found Feb 10, 2026 at 19 stars 3x -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

SystemVerilog

AI Summary

An educational hardware design and simulator that demonstrates a tiny Neural Processing Unit running simplified GPT-2 text generation.

How It Works

🔍 Discover tiny-NPU

You stumble upon this fun project on GitHub that lets anyone explore how AI thinking hardware works from scratch.

🛠️ Get your computer ready

Follow easy steps to install a few free tools, like a simple simulator, so everything runs smoothly on your machine.

🚀 Run the AI demo

Hit play on the demo and watch your tiny hardware brain generate text just like a mini GPT-2, using real AI model pieces.

✅ Check your results

Compare the smart text it creates with perfect examples to see it matches exactly—no surprises.

⚡ Try speed tricks

Experiment with clever memory shortcuts to make text generation way faster, like real AI chips do.

🎉 Master AI hardware

Celebrate as you now understand how tiny chips power big AI dreams, ready to tinker more or share your creation.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 19 to 53 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is tiny-NPU?

Tiny-NPU is an open source NPU in SystemVerilog that runs GPT-2 and LLaMA inference on real Hugging Face weights, delivering bit-exact results via Verilator simulation. It handles full transformer blocks with KV-cache for autoregressive decoding, letting you prototype LLM inference on custom hardware without proprietary black boxes. Developers get a synthesizable design for FPGAs, complete with demos that generate text from prompts like "Hello".

Why is it gaining traction?

Unlike closed-source Google GPU or Apple Neural Engine internals, this open source NPU Verilog project demystifies transformer acceleration—systolic GEMMs, fixed-point ops, and microcode sequencing—in a minimal, verifiable package. The hook: end-to-end demos with 1.8x KV-cache speedup on seq_len=16, plus LLaMA/Mistral support, make it a rare github open source projects entry for hands-on NPU learning. No frameworks needed beyond Verilator and Python for weights.

Who should use this?

RTL designers prototyping tiny NPU for edge LLM inference, FPGA hobbyists targeting Artix-7 boards, or academics reverse-engineering production accelerators like TPUs. Ideal for hardware folks evaluating open source npu drivers 2026 or self-hosted inference without cloud GPUs.

Verdict

Grab it if you're into open source github copilot alternative hardware—docs and sim coverage shine for 15 stars, but 1.0% credibility signals early maturity; expect tweaks for larger models. Strong educational start for Verilog ML tinkerers.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

180

Followers

Base stars: 53 stars

Bonus: AI verified quality (100%)

Account age: 1,252 days

Repo age: 21 days

Updated: Mar 02, 2026