Anthropic
Anthropic models support explicit Prompt caching. On this platform, whether you use the OpenAI chat/completions protocol or the Anthropic v1/messages protocol, you can specify content to be cached with"cache_control": {"type": "ephemeral"}.
- Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4.5, Claude Sonnet 4, and Claude Sonnet 3.7: 1024 tokens
- Claude Haiku 4.5, Claude Haiku 3.5, and Claude Haiku 3: 2048 tokens
OpenAI and OpenAI-compatible models
Typically, these models may support implicit caching. When users repeatedly access the same model with the same Prompt prefix, there is a chance of a cache hit.Gemini
Currently, only implicit caching is supported. Implicit caching does not require manual setup or additional cache_control configuration. When users repeatedly access the same model with the same Prompt prefix, there is a chance of a cache hit. Notes:- The average TTL (cache lifetime) is 3–5 minutes, but it may vary (for example, it may be only a few seconds)
- Gemini 2.5 Flash requires a minimum input of 1024 tokens, while Gemini 2.5 Pro requires a minimum of 4096 tokens