learningCatHD

TELOS SDK: a cache-aware prompt protocol and gateway for portable agent context.

36
5
89% credibility
Found May 28, 2026 at 36 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

TELOS is an open-source tool developed by researchers at Tsinghua University that acts as a smart middle layer between AI coding assistants and the AI services they call. It works by recognizing which parts of a conversation with an AI are permanent (like tool definitions and system instructions) versus which parts change every turn (like timestamps and environment details). It then ensures the permanent parts are remembered by the AI service's cache, so you only pay for the new information each time. The tool installs in seconds, auto-detects popular AI coding assistants, and shows you a live dashboard of your actual dollar savings. It supports multiple AI providers (Anthropic, OpenAI, DeepSeek) and inference frameworks (vLLM, SGLang), and comes from a legitimate university research lab with published benchmarks showing ~40% cost reduction without degrading task accuracy.

How It Works

1
💬 You hear about TELOS

A colleague mentions TELOS at a team meeting — a tool that can cut your AI agent's running costs by up to 90% without changing anything about how your agent works.

2
📦 You install it with one command

You run a simple install command, and TELOS sets itself up automatically on your computer.

3
🔗 TELOS detects your AI tools

TELOS scans your computer and automatically finds which AI coding assistants you already use — like Claude Code, OpenClaw, or others — and connects them all to its gateway.

4
🚀 Everything starts running

The local gateway launches in the background, and your AI assistant is now routing through TELOS without you having to change a single line of code.

5
📊 You open the dashboard and watch the savings add up

You open a dashboard in your browser that shows exactly how much money you're saving in real dollars — not vague percentages — with every conversation your AI has.

💰 Your server bill drops dramatically

Over the next few weeks, your monthly AI server costs shrink significantly because TELOS makes sure your AI assistant's instructions only get sent once, then reused from memory instead of being repeated every single time.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 36 to 36 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is telos-sdk?

TELOS is a Python SDK that sits between your AI agent and the API providers, rewriting prompts so they reuse cached context across conversation turns. Instead of sending the same system prompt and conversation history from scratch every time, it tags content with cache lifetimes and ensures the prompt structure never breaks prefix matching. The result: your API bill drops because the provider recognizes the stable parts and only charges for new tokens. It ships as a local gateway you run alongside your agent, with a CLI for setup and a dashboard showing savings in actual dollars.

Why is it gaining traction?

The hook is simple: developers are watching their API invoices climb and realizing that every turn re-sends the same 4,000-token system prompt at full price. TELOS attacks this with a structural approach rather than compression or heuristics. The three-band protocol (PIN for stable content, FOLD for history, DROP for ephemeral data) means cache hits are guaranteed by construction, not luck. The project backs its claims with SWE-bench benchmarks showing 40% cost reduction at the same task success rate. The setup is three commands: install, init, dashboard.

Who should use this?

Teams running long-running coding agents who are paying per-token for repeated context. If you're using Claude Code, OpenClaw, Hermes, or Codex and noticing that API costs scale faster than productivity, this is for you. Self-hosted deployments using vLLM or SGLang get bidirectional control over cache eviction. Early-stage projects should evaluate carefully given the beta status.

Verdict

TELOS solves a real problem with a clean abstraction and credible benchmarks. The 0.8999999761581421% credibility score reflects a small but academically-backed project from Tsinghua's LEAP Lab. With only 36 stars, the community is still forming, but the protocol design is solid and the implementation covers the major providers. Worth trying on a dev machine to see if the savings numbers hold for your workload.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.