Prompt Caching? #9444

brandonh-msft · 2024-10-28T21:17:28Z

brandonh-msft
Oct 28, 2024

Prompt caching for AOAI and OAI has been released, but has some requirements around how to make it most effective, specifically, the order in which things are serialized.

Is SK already serializing objects in this manner so token caching will get maximum hits? If not, is there work on the backlog to do it and when could we expect it (for each lang)?

evchaki · 2024-10-29T15:49:21Z

evchaki
Oct 29, 2024
Collaborator

@brandonh-msft - thanks for brining this up. This looks like it is happening on the model end so you should be getting this already. Cosmos DB and Redis also supports Semantic Caching if you want to check that out also.

You can also do Semantic Caching with filters also here -

semantic-kernel/dotnet/samples/Concepts/Caching/SemanticCachingWithFilters.cs

Line 10 in 2065883

namespace Caching;

8 replies

ManojG1978 Nov 4, 2024

A related question would be - is it possible for SK to report Cached tokens in the trace messages when hooked up with Open Telemetry. It would be nice to get that value along with the info generated:

Basically, everything OpenAI returns (including cached tokens, like below):

evchaki Nov 4, 2024
Collaborator

@TaoChenOSU - can you see if we can pull in the additional info from OpenAI.

evchaki Nov 4, 2024
Collaborator

@brandonh-msft - this will depend how many tokens you are using with your plugin description and the length of chat history.

brandonh-msft Nov 4, 2024
Author

i understand that, but it won't do anything at all if we aren't structuring the request to take advantage of it. this is what I opened this discussion to ask.

From the responses thus far it seems like the straightforward answer is "no, we aren't" so I will probably transition this to an issue if/when have time to dissect the code & confirm. (unless one of the other participants here wants to do that 🙏🏻)

brandonh-msft Nov 4, 2024
Author

It looks like this behavior is controlled clear down in the OpenAI pkg for .NET

https://github.com/openai/openai-dotnet/blob/e4af16915b120799c34786cc6b05c0e25e632b02/src/Custom/Chat/ChatClient.cs#L118

https://github.com/openai/openai-dotnet/blob/c49dd7065215bc0d094c7f79ccd634a38f0d7b66/src/Generated/Models/ChatCompletionOptions.Serialization.cs#L627

and so it's on them to make sure this is being done. I will open discussion/issue accordingly.

brandonh-msft · 2024-11-04T17:32:05Z

brandonh-msft
Nov 4, 2024
Author

Issue filed on OpenAI .NET SDK: openai/openai-dotnet#281

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Caching? #9444

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Prompt Caching? #9444

brandonh-msft Oct 28, 2024

Replies: 2 comments · 8 replies

evchaki Oct 29, 2024 Collaborator

ManojG1978 Nov 4, 2024

evchaki Nov 4, 2024 Collaborator

evchaki Nov 4, 2024 Collaborator

brandonh-msft Nov 4, 2024 Author

brandonh-msft Nov 4, 2024 Author

brandonh-msft Nov 4, 2024 Author

brandonh-msft
Oct 28, 2024

Replies: 2 comments 8 replies

evchaki
Oct 29, 2024
Collaborator

evchaki Nov 4, 2024
Collaborator

evchaki Nov 4, 2024
Collaborator

brandonh-msft Nov 4, 2024
Author

brandonh-msft Nov 4, 2024
Author

brandonh-msft
Nov 4, 2024
Author