Prompt Caching: The Hidden Rule Behind Every Claude Code Feature
カートのアイテムが多すぎます
カートに追加できませんでした。
ウィッシュリストに追加できませんでした。
ほしい物リストの削除に失敗しました。
ポッドキャストのフォローに失敗しました
ポッドキャストのフォロー解除に失敗しました
-
ナレーター:
-
著者:
概要
AI-generated deep dive into Claude Code prompt caching - the hidden infrastructure rule that shapes every feature you use daily.
Anthropic monitors prompt cache hit rate like infrastructure uptime. If it drops too low, they open an incident. Not a bug ticket - an incident. That single fact explains dozens of Claude Code design decisions that otherwise seem arbitrary.
In this episode, we break down:
- Why CLAUDE.md loads before your conversation (prefix stability)
- How prompt caching works via byte-for-byte prefix matching
- Why plan mode is a tool call, not a mode swap
- The real cost of switching models mid-session (Opus to Haiku can be MORE expensive)
- Why /clear destroys your cache but compaction preserves it
- How system-reminder tags protect your cache hit rate
- The 5-minute cache window and what happens when it expires
- Why adding tools mid-session invalidates everything after that point
Key insight: Claude Code assembles your prompt in a specific order - static system prompt and tools first, CLAUDE.md second, conversation messages last. This is not arbitrary. It is deliberately structured so the most stable content sits earliest in the prefix and stays cached across requests.
The practical takeaway: stop rotating CLAUDE.md sections mid-session, stop adding tools on the fly, stop using /clear as a quick reset, and stop switching models when a task feels too small for Opus. Work with the prefix, not against it.
This episode is based on the blog post at primeline.cc/blog/prompt-caching
Claude Code Deep Dives is a podcast by PrimeLine (@PrimeLineAI) exploring real production setups for Claude Code - from memory systems and agent delegation to context management and self-correcting workflows.
More at primeline.cc