willjriley

Compressed GPU Memory Paging for Diffusion & Video Models — 3.4x faster inference on consumer GPUs

19
0
100% credibility
Found Apr 03, 2026 at 19 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A ComfyUI add-on that makes large AI models for image and video generation run much faster on regular graphics cards by compressing memory transfers.

How It Works

1
😩 Slow AI renders frustrate you

Big AI models for images and videos crash or crawl on your computer because it runs out of graphics memory.

2
🔍 Discover VRAM Pager

You find this clever helper made for ComfyUI that lets huge models run smoothly and quickly on everyday graphics cards.

3
📥 Add it to your setup

You copy the tool into your ComfyUI extras folder and restart the program to make it ready.

4
Drop in the speed booster

You connect the 'Compressed Pager' piece right after loading your model in your creation flow.

5
⚙️ Choose your speed mode

Pick between super-fast or perfect quality mode, then hit generate.

🎉 Create magic in minutes

Your renders fly through steps twice as fast with the same great results, turning hours into moments of joy.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 19 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is vram-pager?

VRAM Pager is a Python tool that compresses model weights to INT8 or FP16 in system RAM, then transfers them over PCIe to consumer GPUs for ultra-fast decompression during inference. It solves the bottleneck of slow full-precision offloads for massive diffusion and video models like Wan 2.2 14B (54GB), letting you run them on 16GB RTX 4090s without quantization or crashes. Drop it into ComfyUI as a single node between your loader and sampler for 3.4x faster layer transfers via CUDA kernels.

Why is it gaining traction?

It stacks seamlessly with ComfyUI's dynamic VRAM for up to 5x per-step speedups at low resolutions, while keeping LoRA and safetensors compatibility—no model tweaks needed. Benchmarks on RTX 4090, A6000, and L40S show 37-43 dB SNR quality (near-lossless) and 3.4x PCIe gains over standard --lowvram, outpacing GGUF for full-precision fidelity. Pre-compiled kernels for RTX 30/40 series mean instant setup without compilation hassles.

Who should use this?

ComfyUI users generating images or video with large unquantized diffusion models (Wan, Flux, SDXL) on 16-24GB consumer GPUs, where --lowvram renders crawl due to PCIe transfers. Ideal for AI artists or researchers needing more denoising steps without upgrading to A100s, especially on older ComfyUI or AMD setups lacking dynamic VRAM.

Verdict

Promising alpha for VRAM-strapped diffusion workflows—try the Compressed Pager node if your renders bottleneck on offloads, but verify outputs given 19 stars and 1.0% credibility score. Solid docs and benchmarks, though end-to-end tests for SDXL/Flux are pending; pair with quality checks before production.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.