Prompt caching lets LLM providers (Anthropic, OpenAI, Google) cache the system prompt + tool definitions + long context from a request + serve subsequent matching requests at materially lower cost.
Mechanics: the provider hashes the prompt prefix; subsequent requests with the same prefix within the cache TTL (5 minutes for Anthropic, varies by provider) cost 10-50 percent of the normal input token price.
For agent workloads with stable system prompts, caching cuts cost 50-80 percent on input tokens. The single highest-impact LLM cost optimisation in 2026. See our LLM cost optimisation guide.