serdes21

serdes21 / flashtile

Public

FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.

55
6
100% credibility
Found Feb 11, 2026 at 43 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Rust
AI Summary

FlashTile is an open-source tool that compiles special GPU kernel files into runnable code for NVIDIA graphics cards, acting as a compatible alternative to official compilers.

How It Works

1
🔍 Discover FlashTile

You hear about a helpful tool that makes compiling GPU kernels easy, just like NVIDIA's own.

2
📥 Grab the program

Download the ready-to-run file for your computer from the project's releases page.

3
⚙️ Quick setup

Rename the file and add it to your path so you can use it anywhere.

4
🚀 Compile your kernels

Feed your kernel files to the tool and watch it create fast GPU code with one command.

5
🔍 Check progress anytime

Peek at the inner workings or debug steps if something needs a closer look.

Kernels ready to shine

Your GPU programs compile perfectly and are set for top-speed performance.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 43 to 55 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is flashtile?

FlashTile is a Rust-based CUDA Tile IR compiler compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs. It takes bytecode from tools like cutile-python and compiles it to runnable CUBIN binaries, acting as a drop-in replacement—just rename the binary to tileiras and add it to your PATH. Developers get validated correctness on 74 test cases spanning CuTile and TileGym kernels, with CLI flags to dump intermediates like IR, MLIR, TIR, TVMScript, CUDA source, or PTX for debugging.

Why is it gaining traction?

It delivers static binaries for Linux and Windows via simple Docker or PowerShell builds, sidestepping complex TVM/TileLang dependencies while matching NVIDIA's tool output. The Rust foundation ensures reliable parsing and lowering without runtime crashes, plus caching for repeated compiles. Early adopters praise the Discord for quick fixes on edge cases like SM120 FMHA.

Who should use this?

GPU kernel writers using CuTile or TileLang who hit licensing limits with NVIDIA's tileiras, or Rust enthusiasts building custom NVIDIA GPU pipelines. Ideal for ML engineers tuning flash attention or MoE models on Hopper/Blackwell hardware, needing reproducible cubins without full TVM stacks.

Verdict

Grab it if you need tileiras compatibility today—43 stars and 1.0% credibility reflect early days with no perf benchmarks, but solid tests and docs make it viable for prototyping. Watch for maturity as TileLang evolves.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.