PRISM: O(1) Photonic Block Selection for Long-Context LLM Inference — eliminates the O(N) KV cache scan via photonic broadcast-and-weight similarity engine on TFLN
PRISM is a simulation tool that demonstrates how a photonic accelerator can drastically speed up memory selection for long-context AI language models by reducing data traffic from all memories to just the most relevant ones.
How It Works
You stumble upon PRISM, a clever idea to make AI handle super long conversations way faster by smartly picking the right memories.
Download the project to your computer and set it up with a simple install so everything works smoothly.
Launch the quick demo to see PRISM pick the best memory blocks for a million-token conversation.
Watch the report show your AI thinking 944 times faster with 18,000 times less energy, just by skipping useless memories.
Test PRISM on real AI models like Qwen to measure speed, energy savings, and accuracy on long texts.
Tweak the simulator to explore different setups and see how PRISM shines on your own long AI tasks.
Now you have tools and insights to make AI handle endless conversations lightning-fast and super efficient.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.