tanishqkumar / ssd
PublicA lightweight inference engine supporting speculative speculative decoding (SSD).
SSD is a research inference engine that makes large language models generate text up to 2x faster using a novel parallel speculation technique.
How It Works
You find this exciting new way to make AI chatbots generate text twice as fast by reading the research paper.
You install a simple tool and prepare your folder of AI models.
You tell the system where your AI models and test conversations are stored.
You download sample questions to test with.
You compare how fast different AI engines respond to the same questions and see SSD win big.
You have a live conversation with a powerful AI like Llama, watching responses stream in super quickly.
Your AI generates text up to twice as fast, perfect for research or fun chats!
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.