hyoseokp / PRISM

Public

PRISM: O(1) Photonic Block Selection for Long-Context LLM Inference — eliminates the O(N) KV cache scan via photonic broadcast-and-weight similarity engine on TFLN

attention kv-cache llm-inference long-context memory-optimization

100% credibility

Found Mar 24, 2026 at 27 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

PRISM is a simulation tool that demonstrates how a photonic accelerator can drastically speed up memory selection for long-context AI language models by reducing data traffic from all memories to just the most relevant ones.

How It Works

🔍 Discover PRISM

You stumble upon PRISM, a clever idea to make AI handle super long conversations way faster by smartly picking the right memories.

📥 Get it ready

Download the project to your computer and set it up with a simple install so everything works smoothly.

🚀 Try the demo

Launch the quick demo to see PRISM pick the best memory blocks for a million-token conversation.

✨ Wow, massive speedup!

Watch the report show your AI thinking 944 times faster with 18,000 times less energy, just by skipping useless memories.

📊 Run benchmarks

Test PRISM on real AI models like Qwen to measure speed, energy savings, and accuracy on long texts.

🔬 Experiment freely

Tweak the simulator to explore different setups and see how PRISM shines on your own long AI tasks.

✅ Supercharge long AI

Now you have tools and insights to make AI handle endless conversations lightning-fast and super efficient.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 27 to 27 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is PRISM?

PRISM is a Python/PyTorch library that speeds up long-context LLM inference by selecting only the top-k most relevant KV cache blocks per decode step, cutting HBM traffic 32x for 128K contexts. Unlike full scans, its GPU selector uses mean-key signatures for quick ranking, while a simulator models a photonic co-processor doing O(1) broadcast-and-weight similarity search at 9ns latency. Fire up `python demo.py` on an H100 for instant benchmarks versus baselines like Quest.

Why is it gaining traction?

It stands out by pairing a ready-to-use GPU block selector (drop-in for sparse attention) with photonic sims projecting 944x speedup and 18k x energy savings over GPU scans—no other repo simulates TFLN MRR physics this accessibly. Not your Minecraft Prism launcher github, Prisma ORM, or Laravel theme, but a fresh take on eliminating the O(N) KV bottleneck developers hate in serving. Benchmarks like NIAH needle retrieval hit 100% recall@32.

Who should use this?

LLM serving engineers battling memory-bound decode on H100/A100 clusters with 128K+ contexts, especially batch=128 workloads where scans eat 40%+ bandwidth. Photonic accelerator researchers prototyping broadcast-and-weight engines or validating MRR impairments. Anyone forking Quest/RocketKV baselines for custom long-context evals.

Verdict

Grab the GPU selector now for real traffic wins (MIT, solid README/demo), but 27 stars and 1.0% credibility signal early days—great prototype, needs community tests before photonic hype pays off.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 27 stars

Bonus: AI verified quality (100%)

Account age: 2,760 days

Repo age: 10 days

License: NOASSERTION

Updated: Mar 24, 2026