WeianMao / triattention
PublicTriAttention โ Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.
TriAttention compresses memory in AI models for long reasoning tasks like math, delivering up to 2.5x faster performance with matching accuracy via vLLM integration.
How It Works
Stumble upon this clever tool while searching for ways to make AI math solvers handle super long problems without slowing down.
Follow a few simple steps to install it on your computer, no complicated setup needed.
Pick a ready-to-use AI brain and math puzzles that download with one command.
Create a quick profile that squeezes memory use while keeping answers spot-on accurate.
Run challenges and watch it solve tough math 2.5 times faster with no mistakes.
Turn it into a web service that apps can chat with instantly.
Now your math-solving assistant tackles marathon problems blazing fast, saving time and memory.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.