Skip to main content

AI + Emerging Tech

Prompt Caching

Prompt caching lets LLM providers (Anthropic, OpenAI, Google) cache the system prompt + tool definitions + long context from a request + serve subsequent matching requests at materially lower cost.

Mechanics: the provider hashes the prompt prefix; subsequent requests with the same prefix within the cache TTL (5 minutes for Anthropic, varies by provider) cost 10-50 percent of the normal input token price.

For agent workloads with stable system prompts, caching cuts cost 50-80 percent on input tokens. The single highest-impact LLM cost optimisation in 2026. See our LLM cost optimisation guide.