-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runnable Context Provider #13315
Runnable Context Provider #13315
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Hey @Toubat appreciate the PR! We actually had something like this a bit ago but removed because it seemed redundant / wasn't really being used #12133 i think the RunnablePassthrough.assign method has really helped with cases like this as well. what do you think of something like this for the example you provided? from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
prompt = PromptTemplate.from_template("{context} {question}")
llm = ChatOpenAI()
retriever = (lambda _: ["doc 1 ...", "doc n"]) # mock retriever
def _format(input):
return "## Documents" + "\n\n".join(x for x in input["context"])
answer_chain = (
RunnablePassthrough.assign(context=_format) | prompt | llm | StrOutputParser()
)
retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough(),}
| RunnablePassthrough.assign(answer=answer_chain)
)
retrieval_chain.invoke("say foo") |
Hey @baskaryan, thanks for providing more insights on this! I believe there is some overlap between the current PR and the What I'm more interested is that if there is a way that allows to connect data from the very earlier stage of the chain pipeline to the very end stage of the chain pipeline (without the need to passing data in the intermediate chain stages). In addition, to use Suppose I have a quite long chain below: core_data = lambda x: get_core_data(x)
chain = chain_op_1
| chain_op_2
| core_data
| chain_op_3
| chain_op_4
|
...
|
chain_op_n
| { "result": RunnablePassthrough() } Now, for some reason I want to update the chain by keeping track of inner data and store that as part of output. Suppose we want to get { "result": RunnablePassthrough(), "core_data": <want to get core_data here> } I can get the sense of how to allow this using core_data = lambda x: get_core_data(x)
chain = {
"core_data": chain_op_1 | chain_op_2 | core_data
}
| RunnablePassthrough.assign(
result=itemgetter("core_data") | chain_op_3 | chain_op_4 | ... | chain_op_n
) The problem I have is, the mental model of using
RunnablePassthrough.assign(a=...) | RunnablePassthrough.assign(b=...) | RunnablePassthrough.assign(c=...) ..
retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough(),}
| RunnablePassthrough.assign(answer=answer_chain)
) At the first glance, I wouldn't know the final answer should be a dict of An example of using core_data = lambda x: get_core_data(x)
@context_provider
def chain(getter, setter):
return chain_op_1
| chain_op_2
| core_data
| setter("core_data")
| chain_op_3
| chain_op_4
|
...
|
chain_op_n
| { "result": RunnablePassthrough(), "core_data": getter("core_data") } This makes minimal change of the original chain structure and looks more clear of what exactly I want to accomplish. I believe the benefit of having some data injection mechanism like |
Closing in favour of #14046 |
(Note: still WIP, would appreciate some feedback from maintainer)
Description: This PR adds a new core Runnable component called
RunnableContextProvider
. The motivation behind this runnable is that sometimes when writing long and complex chains, developers need to pass some core piece of data across multiple stages of the chain. For example, when working with a naive RAG where the retriever retrieves context (sayList[str]
for simplicity), one common case is to pass the retrievedcontext
and originalquestion
as part of the output (say for the sake of doing evals or other data manipulations). The original way of achieving this might look like below:Look at how complex and unreadable the chain becomes even for a naive RAG example. Most of the complexity is due to passing extra data around, which adds lots of
itemgetter
and dataPassthrough
which should be unnecessary.RunnableContextProvider
solves this issue by allowing data sharing across different stages of chain without having to explicitly wire up the data connection pipeline. Here's the basic API usage for implementing the same naive RAG as above:getter
: an instance ofRunnableContextGetter
, which retrieves data from a key-value source automatically initialized in the background. It's aRunnable
that outputs the value retrieved from the shared key-value source identified by the given key (input toRunnableContextGetter
is ignored).setter
: an instance ofRunnableContextSetter
, which updates value into the shared source given the key. The value written into the key is the output from the previous piped Runnable chain. Output fromRunnableContextSetter
is connected to the output of the chain immediately before thesetter
. Therefore, in the case ofThe lambda
format_context
would take the output fromretriever
as its input.batch
andabatch
work out-of-box. Each single chain call inside the batch has a unique source. In other words, chains across different batch call do not share the same source ideally.Also, support the decorator's pattern.
Some Improvement Considerations
setter
to set multiple keys based on the same input. Potential API usage:getter
andsetter
to beinject
andprovide
instead?RunnableContextProvider
is that bothgetter
andsetter
on the same key cannot appear inside theRunnableParallel
because the order of execution is not guaranteed.Issue: None.
Dependencies: None.
Tag Maintainers: @nfcampos
Twitter handle: [will add in the future]