Prompt Cache 专题¶

重要性：⭐⭐⭐（性能 + 成本核心——12x token 节省） 真实位置：Anthropic API 服务端 + src/ 中 cache 相关（content hash、cache token）角色：减少重复 token 发送，降低 90% 成本 关联：topics/cost-tracking.md、topics/cache-strategies.md

1. Prompt Cache 是什么¶

Prompt Cache = Anthropic API 的服务端缓存 - 客户端发送的 prompt 在服务端缓存 5 分钟 - 缓存命中时节省 90% 成本（cache_read 比 input 便宜 90%） - 缓存前缀（prefix）—— 同样的开头才能命中

Claude Code 用法 —— 各种 content hash / 静态 system prompt 设计。

2. 4 大 Cache 类型¶

┌────────────────────────────────────────┐
│  1. System prompt cache (5min)        │
│  2. Tool definitions cache (5min)     │
│  3. Conversation context cache (5min) │
│  4. User input cache (no cache)       │
└────────────────────────────────────────┘

4 种 —— 静态 vs 动态。

3. 12x Token 节省的来源¶

3.1 经典场景¶

Turn 1: 100,000 tokens input, cache miss
Turn 2: 100,000 tokens input, cache HIT (90% 便宜)
Turn 3: 100,000 tokens input, cache HIT
...

多轮对话 —— 同样的 system + tools + context 重复发。

3.2 成本计算¶

Without cache:
  Turn 1: 100k * $3/M = $0.30
  Turn 2: 100k * $3/M = $0.30
  ...
  10 turns: $3.00

With cache:
  Turn 1: 100k * $3/M = $0.30 (write)
  Turn 2: 100k * $0.3/M = $0.03 (read)
  ...
  10 turns: $0.30 + 9 * $0.03 = $0.57

节省: 81%

5min cache TTL —— 多轮对话显著节省。

4. 5min TTL 设计¶

// 服务端 5min TTL

5 分钟 —— 平衡： - 太短：用户每多轮发就要付原价 - 太长：服务端内存压力

5. Content Hash 优化（Claude Code 特有）¶

5.1 `--settings` JSON 字符串¶

generateTempFilePath('claude-settings', '.json', {
  contentHash: trimmedSettings  // 内容哈希
})

内容哈希 —— 同样内容同样路径。

5.2 关键 insight¶

path 一致 → prompt cache 命中。

注释：

The settings path ends up in the Bash tool's sandbox denyWithinAllow list, which is part of the tool description sent to the API. A random UUID per subprocess changes the tool description on every query() call, invalidating the cache prefix and causing a 12x input token cost penalty. The content hash ensures identical settings produce the same path across process boundaries.

UUID vs content hash： - UUID 每次不同 → 12x token penalty - content hash 相同 → cache 命中

6. 4 步 Cache 优化策略¶

6.1 系统 prompt 稳定¶

const systemPrompt = `You are Claude Code, ...`

静态 system prompt —— 命中 cache。

6.2 Tool 定义稳定¶

const tools = getTools()  // 工具列表稳定

稳定 tools —— 命中 cache。

6.3 Context 顺序稳定¶

// context 顺序一致
[systemPrompt, tools, history, attachment]

顺序一致 —— 命中 cache。

6.4 User input 在末尾¶

// user input 最后
[systemPrompt, tools, history, attachment, userInput]

user input 末尾 —— 不影响前缀。

7. Cache Token 字段¶

type Usage = {
  input_tokens: number
  output_tokens: number
  cache_creation_input_tokens: number   // 写 cache
  cache_read_input_tokens: number       // 读 cache
}

4 个字段 —— Anthropic API 返回。

7.1 缓存写¶

// 第一次
cache_creation_input_tokens: 100000
cache_read_input_tokens: 0

写 —— 5min 缓存。

7.2 缓存读¶

// 第二次
cache_creation_input_tokens: 0
cache_read_input_tokens: 100000

读 —— 命中。

8. Cache 失效场景¶

场景	失效？
5min 不用	✅ 失效
修改 system prompt	✅ 失效
修改 tools	✅ 失效
修改 history	✅ 失效（prefix 变了）
修改 attachment 顺序	⚠️ 可能失效
修改 user input	❌ 不失效（末尾）

5 种失效 —— 1 种不失效。

9. Cache 成本对比¶

Token 类型	价格（per 1M）
input	$3（基础）
output	$15（基础）
cache_creation	$3.75（25% 溢价）
cache_read	$0.30（10%）

4 种价格 —— Sonnet 4.6 为例。

9.1 成本节省¶

每轮成本 = (input * 0.10) + (output * 1) + (cache_write * 0.0125)
        ≈ 0.10x + 1x + 0.0125x
        ≈ 1.11x (with cache)
vs
1x + 1x (without cache)

~50% 节省（多轮）。

10. Claude Code 中的 cache 优化¶

10.1 静态 system prompt¶

// 不变的内容在前
const systemPrompt = `You are Claude Code...`
const tools = getTools()
const attachments = getAttachments(...)  // delta
const userInput = ...

4 段 —— 稳定前缀 + 动态后缀。

10.2 attachments 是 delta¶

getAgentListingDeltaAttachment(...)
getMcpInstructionsDeltaAttachment(...)

delta 模式 —— 只发变化。

10.3 settings 内容哈希¶

见上文。

10.4 4 个 delta attachments¶

deferred_tools_delta
agent_listing_delta
mcp_instructions_delta
nested_memory

只发变化 —— cache 友好。

11. 推测的实现¶

async function callAPI(model, systemPrompt, tools, history, userInput) {
  return await anthropic.messages.create({
    model,
    system: systemPrompt,
    tools,
    messages: [...history, userInput],
    // 不显式 cache_control —— 让 SDK 自动
  })
}

SDK 自动 —— 推测。

12. 监控¶

12.1 cost-tracker¶

// 累加 cache_read / cache_write token
accumulateUsage({
  inputTokens, outputTokens,
  cacheReadTokens, cacheWriteTokens
})

累加。

12.2 /insights¶

显示 cache 命中比例。

13. 关键设计模式¶

13.1 Static 前缀¶

不变内容在前。

13.2 Content Hash¶

避免 UUID 失效。

13.3 Delta 模式¶

只发变化。

13.4 顺序稳定¶

context 顺序一致。

13.5 5min TTL¶

服务端决定。

13.6 User input 末尾¶

不影响前缀。

13.7 cost 区分¶

4 种 token 不同价。

14. 关键洞察¶

14.1 Cache 节省 90% 成本¶

cache_read = 0.10x input。

14.2 Content hash 避免 12x penalty¶

UUID 破坏 cache。

14.3 Static 前缀核心¶

不变在前。

14.4 Delta 模式 cache 友好¶