NVlabs

NVlabs / cutile-rs

Public

cuTile Rust provides a safe, tile-based kernel programming DSL for the Rust programming language. It features a safe host-side API for passing tensors to asynchronously executed kernel functions.

34
5
100% credibility
Found Mar 15, 2026 at 34 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

cuTile Rust is a research project providing a safe, tile-based kernel programming DSL for the Rust programming language, featuring a safe host-side API for passing tensors to asynchronously executed kernel functions.

Star Growth

See how this repo grew from 34 to 34 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is cutile-rs?

cuTile Rust provides a safe, tile-based kernel programming DSL for the Rust programming language. It features a safe host-side API for passing tensors to asynchronously executed kernel functions on NVIDIA GPUs. Developers get an ergonomic way to author GPU kernels directly in Rust, compiling to CUDA Tile via MLIR without raw CUDA hazards.

Why is it gaining traction?

Unlike GitHub cuTile Python bindings, NVIDIA cuTile Rust offers native Rust safety with async tensor handling and automatic partitioning for parallel tile execution. The DSL simplifies tile loads/stores while the API integrates seamlessly with Rust async runtimes for efficient kernel launches. Early adopters praise the zero-cost abstractions over writing CUDA C++ or wrestling Python FFI.

Who should use this?

Rust ML engineers building custom GPU kernels for transformers or GEMM ops on Ampere+ NVIDIA cards. HPC devs needing safe async tensor pipelines without CUDA expertise. Teams porting PyTorch/CuPy tile code to Rust for performance-critical inference.

Verdict

Promising alpha for Rust GPU hackers, but 24 stars and 1.0% credibility signal high risk—expect API breaks and setup pain (CUDA 13.2, LLVM 21, nightly Rust). Try examples if you're on Ubuntu 24.04 with sm_80+ hardware; otherwise, stick to mature alternatives like cudarc.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.