- Ben's Bites
- Posts
- Anthropic's Claude has a secret trick to make it 90% cheaper.
Anthropic's Claude has a secret trick to make it 90% cheaper.
Anthropic now allows you to cache your prompts and it's like giving Claude a giant sticky note. It will save common parts of your prompts for some time so you don’t have to eat up the costs multiple times.
What's going on here?
Anthropic is rolling out prompt caching for their AI API, letting developers store frequently used context between calls to Claude.
What does this mean?
Here’s the deal, serious use with LLMs involves sending huge long prompts that don’t change for most of your requests. It could be in the form of:
Tons of examples for the LLM to reference for your task.
Long chats to keep the context of what you said earlier.
Working with long files for question answering.
And that makes a dent in your bank. But with caching, you can save the common part of your prompts for 5 mins. If you make another API call with largely the same prompt, Anthropic will use the saved part and let you run the call at a much lower cost. Not only that it makes getting the outputs faster too.
Why should I care?
It’s available now for all Claude models. Claude 3.5 Sonnet with prompt caching will get much cheaper for long chats where you need a smart AI. With Claude 3 Haiku and multi-shot prompting, you can do traditional data annotation tasks at a much affordable cost now.
AI is getting cheaper and more powerful day over day. It’s an amazing time to be alive.
Reply