Geekgineer

258 KB WASM runtime for Needle a 26M-parameter tool-calling transformer. Runs in browser, Cloudflare Workers, and Node.js. No backend required.

13
2
100% credibility
Found May 19, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

needle-rs is a pure Rust and WebAssembly runtime for the Needle AI model — a tiny 26-million parameter transformer that maps user questions plus a list of available tools into a precise JSON function call. Everything runs locally on the user's device with no internet connection needed, making it ideal for privacy-sensitive applications. The project supports deploying the same model to web browsers, command-line tools, Python applications, Cloudflare Workers, and even embedded devices. It achieves bit-for-bit accuracy matching the official Python implementation and is released under the MIT open-source license.

How It Works

1
💡 You discover a smarter way to route requests

You learn about needle-rs — a tiny AI model that can figure out which tool or function to call based on a user's question, without sending anything to the cloud.

2
📦 You install it in one line

Depending on where you want to run it, you install with a single command — whether that's in your web project, Python app, or just download the small program.

3
🤖 Your AI assistant loads instantly

The model comes to life in under a second — a 26-million parameter brain that fits entirely on your device, no internet required.

4
You choose where it runs
🌐
Browser or web app

It runs directly in JavaScript, keeping all user queries completely private on their own device.

💻
Command line

You run it as a simple command, perfect for scripts and automation.

🐍
Python project

You import it like any other Python library and call it from your code.

5
🛠️ You tell it what tools are available

You provide a simple list of your functions — like 'get_weather', 'send_email', or 'book_flight' — along with what each one needs.

6
You ask a question

A user types something like 'What's the weather in Paris?' and needle-rs reads both the question and your tool list to figure out the right call.

You get back a perfect function call

The model returns exactly the JSON your code needs — the function name and all arguments filled in correctly, ready to execute.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 16 to 13 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is needle-rs?

needle-rs is a Rust-powered runtime that deploys a 26M-parameter tool-calling model directly in your browser, edge workers, or Node.js environment. Think of it as a tiny AI router: give it a user query and a list of available tools, and it outputs the exact JSON to call the right function. No server calls, no API keys, no data leaving the device. The WASM build comes in at 258 KB, and the quantized model weights are 22 MB. You can call it from JavaScript, Python, Rust, or C with a simple API: `engine.run(query, tools_json)` and you get back structured JSON ready to execute.

Why is it gaining traction?

The hook is clear: OpenAI function calling costs money per token and sends your data to their servers. llama.cpp requires 700 MB+ and still needs a beefy machine. needle-rs delivers the same tool-calling capability at roughly 280 ms latency, entirely client-side, for free. The constrained decoder is the secret sauce here—it uses a character-level trie and JSON state machine to guarantee the output is always valid JSON pointing at a real tool, eliminating hallucinated function names entirely. Token-exact parity with the reference Python implementation (560/560 test cases) means you can trust the outputs match what the original model produces.

Who should use this?

Backend developers building privacy-sensitive applications where sending user queries to external APIs is a non-starter. Edge function developers targeting Cloudflare Workers who need lightweight AI routing without cold-start nightmares. Frontend teams wanting to prototype agent interfaces without wiring up API infrastructure. Embedded systems developers who need tool-calling on microcontrollers with `no_std` support. If you need open-ended chat or long-context reasoning, look elsewhere—this is a specialized router, not a general-purpose language model.

Verdict

This is a genuinely useful piece of engineering with a tight scope and solid correctness guarantees. The 1.0% credibility score reflects a young project with only 13 stars, but the token-exact parity testing, multi-platform support, and clean API suggest it's production-minded from the start. The documentation is thorough and the test coverage is impressive for a project this size. Worth evaluating now if your use case fits—watch the repo for community adoption before committing to a production deployment.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.