Prompt caching

Prompt caching stores repeated prompt prefixes so the LLM can reuse previous results. OpenAI automatically caches prompts longer than 1024 tokens and keeps them for a few minutes. Anthropic lets developers mark cache breakpoints with cache_control for greater control. Frameworks like GPTCache match new requests to stored embeddings to reuse prior completions.

Caching shortens response times and reduces costs, especially when system instructions or tool definitions rarely change. Place static content at the beginning of prompts and monitor cache hit rates to tune performance. Carefully manage cache lifetimes and security for sensitive data.

Command Palette

References