RightNow-AI / autokernel

Public

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

www.rightnowai.coforge autoresearch cuda gpu kernel-optimization pytorch

594

100% credibility

Found Mar 11, 2026 at 341 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

AutoKernel is a tool that profiles PyTorch models on GPUs to identify bottleneck operations, extracts them as editable Triton kernels, and enables AI agents to autonomously optimize them for speedups via an automated edit-test-revert loop.

How It Works

🔍 Discover AutoKernel

You hear about a smart tool that finds slow parts in your AI model and uses AI to make them faster overnight.

⚙️ Get ready quickly

Download the tool and prepare sample data with a few simple steps on your computer with a good graphics card.

📈 Spot the slowdowns

Run your AI model through the profiler to see exactly which math operations are taking the most time on your graphics card.

✂️ Pull out the trouble spots

The tool grabs the slowest operations and turns them into simple building blocks ready for improvement.

🤖 Let AI optimize overnight

Give the AI clear instructions, point it at one block, and let it experiment and improve automatically while you sleep.

✅ Test the faster model

Put the improved blocks back into your model, run a check to confirm it's correct, and measure the overall speedup.

🚀 Enjoy faster AI

Your model now runs quicker end-to-end, saving time and energy on every use, with graphs showing your progress.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 341 to 594 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is autokernel?

Autokernel is Python autoresearch for GPU kernels: give it any PyTorch model, and it profiles bottlenecks, extracts them as Triton kernels, then runs AI agents overnight to produce optimized versions. You go to sleep with a slow model and wake up to faster inference via autonomous edit-benchmark-keep loops. It handles nine core deep learning ops like matmul, flash attention, and layernorm.

Why is it gaining traction?

It automates the drudgery of Triton tuning with correctness-first benchmarks (five stages including edge cases and stability), Amdahl's law scheduling for max impact, and TSV-logged progress charts. Developers dig the hands-off overnight gains—often 1.5-3x speedups—without mastering PTX or endless manual iteration. CLI tools make profiling, extraction, and verification dead simple.

Who should use this?

ML engineers deploying PyTorch models to production who hit GPU walls on inference. LLM researchers tuning Llama or GPT-2 bottlenecks. Triton users optimizing custom kernels for H100s or 4090s without full-time kernel hackers.

Verdict

Intriguing autokernel ai experiment with solid docs and quickstart, but immature at 40 stars and 1.0% credibility—test on non-critical workloads first. Strong potential for PyTorch/Triton speedups if agents mature.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

594

Stars

Forks

347

Followers

Base stars: 594 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 213 days

Repo age: 3 days

License: MIT

Updated: Mar 14, 2026