Prompt caching
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. OpenClaw normalizes provider usage intocacheRead and cacheWrite where the upstream API exposes those counters directly.
Status surfaces can also recover cache counters from the most recent transcript
usage log when the live session snapshot is missing them, so /status can keep
showing a cache line after partial session metadata loss. Existing nonzero live
cache values still take precedence over transcript fallback values.
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
This page covers all cache-related knobs that affect prompt reuse and token cost.
Provider references:
- Anthropic prompt caching: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
- OpenAI prompt caching: https://developers.openai.com/api/docs/guides/prompt-caching
- OpenAI API headers and request IDs: https://developers.openai.com/api/reference/overview
- Anthropic request IDs and errors: https://platform.claude.com/docs/en/api/errors
Primary knobs
cacheRetention (global default, model, and per-agent)
Set cache retention as a global default for all models:
agents.defaults.params(global default — applies to all models)agents.defaults.models["provider/model"].params(per-model override)agents.list[].params(matching agent id; overrides by key)
contextPruning.mode: "cache-ttl"
Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.
Heartbeat keep-warm
Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.agents.list[].heartbeat.
Provider behavior
Anthropic (direct API)
cacheRetentionis supported.- With Anthropic API-key auth profiles, OpenClaw seeds
cacheRetention: "short"for Anthropic model refs when unset. - Anthropic native Messages responses expose both
cache_read_input_tokensandcache_creation_input_tokens, so OpenClaw can show bothcacheReadandcacheWrite. - For native Anthropic requests,
cacheRetention: "short"maps to the default 5-minute ephemeral cache, andcacheRetention: "long"upgrades to the 1-hour TTL only on directapi.anthropic.comhosts.
OpenAI (direct API)
- Prompt caching is automatic on supported recent models. OpenClaw does not need to inject block-level cache markers.
- OpenClaw uses
prompt_cache_keyto keep cache routing stable across turns and usesprompt_cache_retention: "24h"only whencacheRetention: "long"is selected on direct OpenAI hosts. - OpenAI responses expose cached prompt tokens via
usage.prompt_tokens_details.cached_tokens(orinput_tokens_details.cached_tokenson Responses API events). OpenClaw maps that tocacheRead. - OpenAI does not expose a separate cache-write token counter, so
cacheWritestays0on OpenAI paths even when the provider is warming a cache. - OpenAI returns useful tracing and rate-limit headers such as
x-request-id,openai-processing-ms, andx-ratelimit-*, but cache-hit accounting should come from the usage payload, not from headers. - In practice, OpenAI often behaves like an initial-prefix cache rather than Anthropic-style moving full-history reuse. Stable long-prefix text turns can land near a
4864cached-token plateau in current live probes, while tool-heavy or MCP-style transcripts often plateau near4608cached tokens even on exact repeats.
Amazon Bedrock
- Anthropic Claude model refs (
amazon-bedrock/*anthropic.claude*) support explicitcacheRetentionpass-through. - Non-Anthropic Bedrock models are forced to
cacheRetention: "none"at runtime.
OpenRouter Anthropic models
Foropenrouter/anthropic/* model refs, OpenClaw injects Anthropic
cache_control on system/developer prompt blocks to improve prompt-cache
reuse only when the request is still targeting a verified OpenRouter route
(openrouter on its default endpoint, or any provider/base URL that resolves
to openrouter.ai).
If you repoint the model at an arbitrary OpenAI-compatible proxy URL, OpenClaw
stops injecting those OpenRouter-specific Anthropic cache markers.
Other providers
If the provider does not support this cache mode,cacheRetention has no effect.
Google Gemini direct API
- Direct Gemini transport (
api: "google-generative-ai") reports cache hits through upstreamcachedContentTokenCount; OpenClaw maps that tocacheRead. - If you already have a Gemini cached-content handle, you can pass it through as
params.cachedContent(or legacyparams.cached_content) on the configured model. - This is separate from Anthropic/OpenAI prompt-prefix caching. OpenClaw is forwarding a provider-native cached-content reference, not synthesizing cache markers.
Gemini CLI JSON usage
- Gemini CLI JSON output can also surface cache hits through
stats.cached; OpenClaw maps that tocacheRead. - If the CLI omits a direct
stats.inputvalue, OpenClaw derives input tokens fromstats.input_tokens - stats.cached. - This is usage normalization only. It does not mean OpenClaw is creating Anthropic/OpenAI-style prompt-cache markers for Gemini CLI.
OpenClaw cache-stability guards
OpenClaw also keeps several cache-sensitive payload shapes deterministic before the request reaches the provider:- Bundle MCP tool catalogs are sorted deterministically before tool
registration, so
listTools()order changes do not churn the tools block and bust prompt-cache prefixes. - Legacy sessions with persisted image blocks keep the 3 most recent completed turns intact; older already-processed image blocks may be replaced with a marker so image-heavy follow-ups do not keep re-sending large stale payloads.
Tuning patterns
Mixed traffic (recommended default)
Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:Cost-first baseline
- Set baseline
cacheRetention: "short". - Enable
contextPruning.mode: "cache-ttl". - Keep heartbeat below your TTL only for agents that benefit from warm caches.
Cache diagnostics
OpenClaw exposes dedicated cache-trace diagnostics for embedded agent runs. For normal user-facing diagnostics,/status and other usage summaries can use
the latest transcript usage entry as a fallback source for cacheRead /
cacheWrite when the live session entry does not have those counters.
Live regression tests
OpenClaw keeps one combined live cache regression gate for repeated prefixes, tool turns, image turns, MCP-style tool transcripts, and an Anthropic no-cache control.src/agents/live-cache-regression.live.test.tssrc/agents/live-cache-regression-baseline.ts
Anthropic live expectations
- Expect explicit warmup writes via
cacheWrite. - Expect near-full history reuse on repeated turns because Anthropic cache control advances the cache breakpoint through the conversation.
- Current live assertions still use high hit-rate thresholds for stable, tool, and image paths.
OpenAI live expectations
- Expect
cacheReadonly.cacheWriteremains0. - Treat repeated-turn cache reuse as a provider-specific plateau, not as Anthropic-style moving full-history reuse.
- Current live assertions use conservative floor checks derived from observed live behavior on
gpt-5.4-mini:- stable prefix:
cacheRead >= 4608, hit rate>= 0.90 - tool transcript:
cacheRead >= 4096, hit rate>= 0.85 - image transcript:
cacheRead >= 3840, hit rate>= 0.82 - MCP-style transcript:
cacheRead >= 4096, hit rate>= 0.85
- stable prefix:
- stable prefix:
cacheRead=4864, hit rate0.966 - tool transcript:
cacheRead=4608, hit rate0.896 - image transcript:
cacheRead=4864, hit rate0.954 - MCP-style transcript:
cacheRead=4608, hit rate0.891
88s.
Why the assertions differ:
- Anthropic exposes explicit cache breakpoints and moving conversation-history reuse.
- OpenAI prompt caching is still exact-prefix sensitive, but the effective reusable prefix in live Responses traffic can plateau earlier than the full prompt.
- Because of that, comparing Anthropic and OpenAI by a single cross-provider percentage threshold creates false regressions.
diagnostics.cacheTrace config
filePath:$OPENCLAW_STATE_DIR/logs/cache-trace.jsonlincludeMessages:trueincludePrompt:trueincludeSystem:true
Env toggles (one-off debugging)
OPENCLAW_CACHE_TRACE=1enables cache tracing.OPENCLAW_CACHE_TRACE_FILE=/path/to/cache-trace.jsonloverrides output path.OPENCLAW_CACHE_TRACE_MESSAGES=0|1toggles full message payload capture.OPENCLAW_CACHE_TRACE_PROMPT=0|1toggles prompt text capture.OPENCLAW_CACHE_TRACE_SYSTEM=0|1toggles system prompt capture.
What to inspect
- Cache trace events are JSONL and include staged snapshots like
session:loaded,prompt:before,stream:context, andsession:after. - Per-turn cache token impact is visible in normal usage surfaces via
cacheReadandcacheWrite(for example/usage fulland session usage summaries). - For Anthropic, expect both
cacheReadandcacheWritewhen caching is active. - For OpenAI, expect
cacheReadon cache hits andcacheWriteto remain0; OpenAI does not publish a separate cache-write token field. - If you need request tracing, log request IDs and rate-limit headers separately from cache metrics. OpenClaw’s current cache-trace output is focused on prompt/session shape and normalized token usage rather than raw provider response headers.
Quick troubleshooting
- High
cacheWriteon most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings. - High
cacheWriteon Anthropic: often means the cache breakpoint is landing on content that changes every request. - Low OpenAI
cacheRead: verify the stable prefix is at the front, the repeated prefix is at least 1024 tokens, and the sameprompt_cache_keyis reused for turns that should share a cache. - No effect from
cacheRetention: confirm model key matchesagents.defaults.models["provider/model"]. - Bedrock Nova/Mistral requests with cache settings: expected runtime force to
none.