-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Anthropic prompt caching support, add example #1006
Conversation
cc @Emil-io give us some UX feedback 🙏 |
Hey @vblagoje - here is my feedback that you asked for:
|
Thanks for feedback:
|
Hey @vblagoje - first of all, this looks very interesting! Here my feedback, feel free to correct me if I made some false assumptions. 1. How this fits into Haystack Pipelines 2. Specifying the Caching |
Thanks for feedback Emil.
|
I've added data about prompt caching to be printed to stdout, confirming prompt caching |
@julian-risch please run the example yourself to see the prompt caching effect. |
@Emil-io have you tried the prompt caching example? @TuanaCelik can you take a look once again, run the example! |
I tried it out and the caching works for me. Tried to measure the speed up but to no avail. Time to first token did not seem to improve for me when I turned caching off or on. Could you double check that? Would be important for a convincing example. Other feedback: when I wanted to turn off caching, at first I only commented out only |
0c7a0a6
to
3f9f0ae
Compare
@Amnah199 please have a look and I'll ask @julian-risch to have one as well. Running the example is a must. The speedup with prompt caching is visible but I expected it to be more prominent. Another, perhaps equally important benefit is the cost saving with caching. In conclusion, it is still important to have this feature added as users will ask for it. |
@vblagoje, I tried the example, but the printed usage for all questions returned |
HI @vblagoje |
Have you installed the branch version of Anthropic integration before running the example? And the latest release of haystack-ai? |
@vblagoje explained the example in more detail and I have tested it. I think this use of prompt caching would make sense in certain use cases. Tagging @julian-risch for reference. |
@julian-risch let's integrate this, I can help @dfokina write a paragraph in AnthropicChatGenerator about it |
@vblagoje |
For some reason Anthropic caching doesn't seem to work on small messages (i.e. a short instruction). Perhaps there is a minima length they require cached content to be. I could recreate prompt_caching example in integration test? cc @julian-risch this is to test when prompt caching gets disabled as beta - perhaps we'll get some warning but I doubt an exception. Perhaps we can monitor the Anthropic prompt caching devs and eventually when prompt caching becomes default - adjust our code base at that time.... |
@vblagoje Ah, true. Found the minimum cacheable length in their docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#cache-limitations In that case, let's use 1024 tokens in the integration test? $3.75 / MTok is the cost for cache write so it's still cheap. |
Amazing, will do 🙏 |
@Amnah199 and @julian-risch - this one should be ready now, lmk if you see any additional opportunities for improvement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍 Please don't forget to write a paragraph in AnthropicChatGenerator about it.
Will do 🙏 - keeping this one open until prompt caching docs is integrated and a new release made |
Docs updated https://docs.haystack.deepset.ai/docs/anthropicchatgenerator |
Prompt caching available in anthropic-haystack integration v1.1.0 onward. |
* Add prompt caching, add example * Print prompt caching data in example * Lint * Anthropic allows multiple system messages, simplify * PR feedback * Update prompt_caching.py example to use ChatPromptBuilder 2.5 fixes * Small fixes * Add unit tests * Improve UX for prompt caching example * Add unit test for _convert_to_anthropic_format * More integration tests * Update test to turn on/off prompt cache
Why:
Introduces prompt caching for AnthropicChatGenerator. As prompt caching will be enabled by default in the near future we don't add a new init parameter for it.
What:
How can it be used:
See
integrations/anthropic/example/prompt_caching.py
example for detailed usageHow did you test it:
integrations/anthropic/example/prompt_caching.py
example and additional manual tests.