GFD is a high-performance library that speeds up how AI models move data between CPU and GPU memory. In LLM inference, pieces of data called tokens are scattered across CPU memory but need to be assembled in GPU memory for processing. Standard methods copy each piece one at a time, which is slow. GFD batches these transfers together, uses multiple CPU cores to gather scattered data efficiently, and moves everything in one large operation while the GPU continues computing. The result is 14 to 53 times faster data transfer, enabling AI inference systems to run more efficiently. The library supports single-GPU and multi-GPU setups with intelligent core allocation.
How It Works
Your LLM inference is slow because moving scattered data to your GPU takes forever with standard methods.
You discover a library that assembles scattered data into smooth, fast transfers to your GPU—14 to 53 times faster than before.
You tell GFD where your scattered data lives in CPU memory and where it should go on your GPU.
GFD's CPU workers gather your scattered data while your GPU keeps computing, then moves everything in one efficient burst.
Perfect for one accelerator—everything runs smoothly with minimal setup
Each GPU gets its own dedicated CPU cores, achieving up to 340 GB/s combined bandwidth
Your inference pipeline now runs dramatically faster, with bandwidth reaching 53 GB/s instead of 3 GB/s.
Your LLM serves more requests per second while using your hardware more efficiently than ever.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.