LiangSu8899 / FlashRT

Public

FlashRT is a high-performance realtime inference engine for small-batch, latency-sensitive AI workloads. The flagship integration is production VLA control for Pi0, Pi0.5, GROOT N1.6, and Pi0-FAST

cuda cuda-kernels gr00t gr00t-n1-6-3b pi

69% credibility

Found May 03, 2026 at 23 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

C++

AI Summary

FlashRT is a high-performance inference engine for running vision-language-action AI models in realtime on NVIDIA GPUs like Jetson Thor and RTX cards.

How It Works

🔍 Discover Fast AI for Robots

You hear about FlashRT, a tool that makes AI models run super fast on your NVIDIA graphics card for robot control and image generation.

📦 Get the Ready-to-Use Package

Download the pre-built container or follow simple steps to set it up on your computer with your NVIDIA card.

⚡ Load Your AI Model in 3 Lines

Paste your model file path and a short name like 'pi05' into a tiny script, and your AI brain is ready to think.

🖼️ Show Images and Give Instructions

Feed in photos from your robot's camera and type a command like 'pick up the red block'.

🤖 Watch Lightning-Fast Actions

In under 50 milliseconds, get smooth robot movements that match your instructions perfectly, ready for real-world use.

Sign up to see the full architecture

3 more

Star Growth

See how this repo grew from 23 to 25 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is FlashRT?

FlashRT is a C++ inference engine built for high-performance realtime execution of latency-sensitive, small-batch AI workloads, with flagship production integration for VLA control models like Pi0, Pi0.5, GROOT N1.6, and Pi0-FAST. It skips ONNX export and engine compilation by loading PyTorch safetensors or JAX Orbax checkpoints into static CUDA graphs for instant replay after a ~3s warmup. Developers get a dead-simple Python API: `load_model(checkpoint, config="pi05")` then `predict(images, prompt)` yielding actions in 44ms on Jetson Thor.

Why is it gaining traction?

It crushes alternatives like naive PyTorch (16-18x faster on Thor) or TensorRT (no rebuilds across driver/GPU swaps) for small-batch control, auto-dispatching kernels across Jetson Thor to RTX 5090. Production FP8/NVFP4 quantization with cached calibration preserves cosine >0.999 accuracy, while vendored Flash-Attention ensures no pip dependencies. LIBERO benchmarks validate 98% task success out-of-box.

Who should use this?

Robotics engineers deploying Pi0/Pi0.5 for manipulation on LIBERO tasks, needing sub-50ms latency in edge control loops. VLA researchers benchmarking GROOT N1.6 or Pi0-FAST autoregressive generation on Thor/RTX without compilation overhead. NVIDIA edge deployers swapping Jetson/RTX hardware mid-project.

Verdict

Grab it if realtime VLA control fits—docs, API snippets, and per-model benchmarks make evaluation a 6-minute git clone away, despite 19 stars and 0.7% credibility flagging solo-dev maturity. Run `examples/quickstart.py` on your rig; upstream hardware reports welcome.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 25 stars

Penalty: AI uncertain (70%): -90%

Account age: 3,623 days

Repo age: 5 days

License: Apache-2.0

Updated: May 04, 2026