Using Claude prompt caching with ChatPromptTemplate #25610
Replies: 2 comments 10 replies
-
Hey there, @felixwoestmann! I'm here to help you with your bug, answer any questions you have, and guide you on becoming a contributor. Let's dive in and work on resolving your issue together. To implement Claude's prompt caching with Here's how you can set it up: from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage
prompt = ChatPromptTemplate.from_messages([
SystemMessage(content=cached_prompt, additional_kwargs={"cache-control": {"type": "ephemeral"}}),
HumanMessage(content=f"Here is the comment to analyze: {comment}")
]) This ensures that the Additionally, make sure that your caching mechanism is properly configured within the def _generate_with_cache(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
if isinstance(self.cache, BaseCache):
llm_cache = self.cache
else:
llm_cache = get_llm_cache()
check_cache = self.cache or self.cache is None
if check_cache:
if llm_cache:
llm_string = self._get_llm_string(stop=stop, **kwargs)
prompt = dumps(messages)
cache_val = llm_cache.lookup(prompt, llm_string)
if isinstance(cache_val, list):
return ChatResult(generations=cache_val)
elif self.cache is None:
pass
else:
raise ValueError(
"Asked to cache, but no cache found at `langchain.cache`."
)
result = self._generate(messages, stop=stop, run_manager=run_manager, **kwargs)
for idx, generation in enumerate(result.generations):
if run_manager and generation.message.id is None:
generation.message.id = f"run-{run_manager.run_id}-{idx}"
generation.message.response_metadata = _gen_info_and_msg_metadata(generation)
if len(result.generations) == 1 and result.llm_output is not None:
result.generations[0].message.response_metadata = {
**result.llm_output,
**result.generations[0].message.response_metadata,
}
if check_cache and llm_cache:
llm_cache.update(prompt, llm_string, result.generations)
return result This code ensures that the caching mechanism checks the cache for existing results and updates the cache with new results as needed [1]. By setting the |
Beta Was this translation helpful? Give feedback.
-
Solved it by doing this:
|
Beta Was this translation helpful? Give feedback.
-
Checked other resources
Commit to Help
Example Code
Description
Hello,
I am currently trying to implement Claude's prompt caching in my existing application.
Setting the extra header to enable this Beta works well and the API is returning an overview of how many tokens were written to and read from the cache. However these are always 0, due to my setting the
cache-control
property of the message wrongly.On the X.com account of Langchain I found an example to set the
cache-control
property when providing messages as a Json.However I am using ChatPromptTemplates and was wondering how I would do it there, currently I am trying something like this:
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
Beta Was this translation helpful? Give feedback.
All reactions