ming053l

ming053l / ELSA

Public

[CVPR 2026 Findings] ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

19
0
100% credibility
Found May 04, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

ELSA provides drop-in PyTorch modules and kernels to accelerate attention in vision transformers like ViT and Swin, matching exact softmax accuracy with lower memory and higher speed.

How It Works

1
🔍 Discover faster AI vision

You find ELSA while searching for ways to speed up image recognition models without losing accuracy.

2
📦 Install easily

Run a simple command to add ELSA to your Python setup, and it grabs everything you need.

3
🖼️ Load your model

Pick a ready-made image model like ViT or Swin that you already know and love.

4
Swap in super attention

With one line of code, upgrade the model's attention to ELSA's lightning-fast version.

5
▶️ Test on your images

Feed in your photos or scans, and watch it process them super quickly.

🚀 Win big on speed

Your model runs twice as fast with way less memory, perfect results every time.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ELSA?

ELSA delivers drop-in attention kernels for Vision Transformers, reformulating softmax as an exact linear-scan operation that's fast and memory-efficient. Developers get PyTorch modules, full ViT/Swin models, and patching tools to swap into timm or Hugging Face setups without retraining—using Triton for FP16/FP32 and CUDA extensions for edge cases. Straight from a CVPR 2026 Findings paper, it's Python-based with raw Q/K/V APIs for custom Transformers.

Why is it gaining traction?

It beats FlashAttention and SDPA on FP32 speed (up to 2.15x) and memory (39% less) for high-res ViT inference, while staying exact and Tensor-Core-free—ideal when FA flops on your GPU. Benchmarks cover ViT/Swin throughput, long-context LLMs, and Jetson edge runs, with patching that accelerates pretrained models in one line. Buzz in CVPR 2026 papers GitHub and Reddit threads highlights its no-retrain drop-in for hyperspectral or 3D vision tasks.

Who should use this?

Vision ML engineers deploying high-res ViTs on memory-tight GPUs, like medical imaging or satellite analysis where FP32 matters. Edge AI devs on Jetsons needing real-time perception without O(n²) blowup. Transformer hackers patching timm/Swin for CVPR 2026 workshops or rebuttals, skipping retrains.

Verdict

Promising for exact, lightweight attention—grab it if you're in CVPR 2026 paper lists chasing efficient ViTs. Low 19 stars and 1.0% credibility score mean it's early; solid docs and benchmarks help, but watch for stability in prod. (198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.