IndexCache provides patches for SGLang and vLLM to speed up AI model inference using sparse attention by reusing selected token indices across layers.
How It Works
You hear about IndexCache, a clever trick that makes big AI models handle long conversations much faster by smartly reusing their focus points across processing steps.
Select SGLang if you want super speedy AI serving.
Pick vLLM for flexible and powerful AI inference.
Easily blend the IndexCache magic into your chosen AI runner to cut down redundant work.
Decide on a simple repeating pattern or a custom setup to keep just the right amount of focus points for top speed.
Fire up your enhanced AI server, and watch it process prompts lightning-quick.
Your AI now thinks up to 1.8 times faster on long texts with almost no loss in quality, making chats and tasks fly.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.