Say Goodbye to Repetitive Prompts!
Anthropic's Game-Changer: Prompt Caching Cuts Costs for Developers
Last updated:

Edited By
Mackenzie Ferguson
AI Tools Researcher & Implementation Consultant
Anthropic has introduced a groundbreaking 'prompt caching' feature for its Claude models, allowing developers to store prompts and call them up later at a significantly reduced cost. By memorizing the context between API calls, this innovation promises up to 90% cost reduction and 80% latency reduction. It’s currently available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus on the horizon.
Anthropic, a leading AI research company, has recently introduced a new feature called prompt caching to its API which promises to significantly reduce costs for developers. This new feature, currently available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, allows users to save prompts and recall them in later sessions with additional context without incurring higher costs.
Prompt caching works by storing the context between API calls, making it unnecessary for developers to repeat prompts. This feature is particularly useful for scenarios where a large amount of context is required in a prompt, as it enables users to add extra background information without significant cost increases. According to early users, prompt caching has resulted in notable speed and cost improvements, particularly when a full knowledge base, numerous examples, or each turn of a conversation is included in the prompt.
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.














The implications of this development are extensive for the broader business environment. For instance, businesses that rely on conversational agents can benefit from reduced latency and costs when handling long instructions or large documents. Additionally, developers working on autocomplete functionalities for code or those embedding entire documents in a prompt can expect faster and more efficient performance.
The new pricing model reflects these efficiencies. Writing a prompt to be cached for the Claude 3.5 Sonnet costs $3.75 per one million tokens, whereas using a cached prompt reduces the cost to just $0.30 per million tokens. The base price of an input to the Claude 3.5 Sonnet model is $3 per million tokens, meaning that developers can achieve up to tenfold savings when employing cached prompts.
Similarly, for Claude 3 Haiku, caching a prompt costs $0.30 per million tokens, and using a stored prompt costs only $0.03 per million tokens. Although prompt caching is not yet available for Claude 3 Opus, Anthropic has announced future pricing which includes $18.75 per million tokens to write to the cache and $1.50 per million tokens for accessing the cached prompt.
While the initial API call may be slightly more expensive to account for storing the prompt in the cache, subsequent calls are significantly cheaper. This can lead to cost reductions of up to 90% and reduced latency by up to 80%, as stated by Anthropic.
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.














One important aspect to note is that Anthropic's cache has a five-minute lifetime, refreshable upon each use. This makes prompt caching similar to context caching solutions from other platforms, although the pricing models differ. For instance, Gemini charges $4.50 per million tokens per hour to maintain the context cache, while Anthropic charges based on cache writes and offers a shorter cache lifetime.
Anthropic's introduction of prompt caching is part of a broader strategy to stay competitive in the AI industry, particularly against rivals like Google and OpenAI. The pricing strategy seems tailored to attract third-party developers looking for cost-effective solutions. This is not Anthropic’s first attempt to compete with price cuts, as it had previously reduced the prices of its tokens before releasing the Claude 3 family of models.
Other platforms also offer prompt caching or similar functionalities. For example, Lamina, an LLM inference system, uses KV caching to decrease GPU costs. OpenAI’s GPT-4o offers memory features that remember user preferences or details but do not store exact prompts and responses like Anthropic’s prompt caching does.
In summary, Anthropic’s prompt caching feature is set to revolutionize the way developers interact with AI models, offering substantial cost savings and performance efficiencies. This advancement not only makes AI more accessible but also facilitates more advanced and fine-tuned applications. Business readers and developers alike would benefit from staying updated on such innovations, as these tools can significantly impact productivity and development costs.