Async RAG Chain for Agent #17591

bwhartlove · 2024-02-15T19:59:01Z

bwhartlove
Feb 15, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

class CustomRAGTool(BaseTool):
    name = "custom_search"
    description = "useful for when you need to research for information"
    args_schema: Type[BaseModel] = SearchInput
    chain: RunnableSerializable

    def _run(
        self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool."""
        response = self.chain.invoke(query)
        return response

    async def _arun(
        self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool asynchronously."""
        response = await chain.ainvoke(query)
        return response

Description

I have an Opensearch vector store index and built a retriever object off of that. Using the retriever and an LLM served up with HuggingFace's TGI engine, I built a chain using LCEL. That chain is used as the primary component of a custom class called CustomRAGTool that invokes the chain. What I've noticed is that despite creating an arun function for my tool and verifying it's being called, the tool still blocks the primary thread of my application. That being said, I wanted to ask (a) is this the right path to follow for async tools and (b) can RAG chains work in an async context? I figure I am missing something along the way. Thanks!

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Wed Jan 24 23:11:18 UTC 2024
Python Version: 3.11.5 (main, Feb 7 2024, 16:07:02) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)]

Package Information

langchain_core: 0.1.23
langchain: 0.1.7
langchain_community: 0.0.20
langsmith: 0.0.87
langchainhub: 0.1.14
langserve: 0.0.41

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph

@bwhartlove · 2024-02-15T19:59:13Z

dosubot[bot]
bot Feb 15, 2024

🤖

Hey @bwhartlove, great to see you diving into async adventures with LangChain! Hope you're doing well.

Based on the code you've provided and the context given, it seems like you're on the right track with your implementation of the arun function in your CustomRAGTool class. The arun function is indeed the correct approach for making asynchronous calls in LangChain.

However, it's important to note that the asynchronous behavior of your tool is dependent on the components of your chain being able to operate asynchronously as well. If any component of your chain is blocking, it will cause your entire tool to block, even if you're calling it with arun.

In your case, you're using a RAG chain, which should be able to operate asynchronously. However, you'll need to ensure that your retriever and llm components are also capable of asynchronous operation. If they're not, you'll need to modify them to support this, or find alternatives that do.

As for your question about whether RAG chains can work in an async context, the answer is yes. The LangChain framework does support asynchronous operations in its core functionality. This is evident from the use of Python's asyncio library in the provided code. For instance, the handle_event function checks if an event is a coroutine (an asynchronous function) and if so, it adds it to a list of coroutines to be run asynchronously.

Furthermore, the AsyncRunManager class and its methods such as on_text and on_retry are defined as asynchronous, indicating that they are designed to support asynchronous operations.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

bwhartlove · 2024-02-15T22:59:39Z

bwhartlove
Feb 15, 2024
Author

@dosu-bot Thanks for the information! I am using the HuggingFaceTextGenInference class for my LLM and the OpenSearchVectorSearch class is the basis for my retriever. Do those support async calls in a chain?

0 replies

@dosu-bot · 2024-02-15T23:05:09Z

dosubot[bot]
bot Feb 15, 2024

🤖

Based on the information available in the LangChain repository, the HuggingFaceTextGenInference class does support asynchronous operations. This is evident from the presence of the async def _acall and async def _astream methods in the class. These methods are asynchronous versions of the _call and _stream methods respectively. The _acall method is an asynchronous method that generates a text response for a given prompt. If the streaming attribute of the class instance is set to True, it uses the _astream method to generate a stream of tokens asynchronously. You can find more details in the source code.

As for the OpenSearchVectorSearch class, I couldn't find any specific information about its support for asynchronous operations in the repository. It's possible that this information might be available elsewhere or I might have missed it.

If your arun function is still blocking the primary thread, it might be due to the way it's implemented or how the asynchronous calls are being handled. Without seeing the actual implementation of your arun function, it's hard to provide a specific solution. However, you might want to ensure that you're properly using Python's asyncio library to manage your asynchronous tasks.

If you're still facing issues, could you please provide more details about your arun function and how you're using the HuggingFaceTextGenInference and OpenSearchVectorSearch classes in your CustomRAGTool class? This would help in identifying the issue and providing a more accurate solution.

Sources

libs/community/langchain_community/llms/huggingface_text_gen_inference.py

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

bwhartlove · 2024-02-27T17:07:33Z

bwhartlove
Feb 27, 2024
Author

@dosu-bot Below is my _arun function for my tool:

async def _arun(
        self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool asynchronously."""
        response = await self.chain.ainvoke(query)
        return response

The tool class I made has a chain element that stores the following chain:

chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

The chain passes the input through my retriever, into a PromptTemplate, through the llm, and finally a output parser.

The LLM is built using the HuggingFace TGI server:

llm = HuggingFaceTextGenInference(
    inference_server_url="http://localhost:3000/",
    temperature=0.01,
    streaming=True,
    max_new_tokens=4096
)

The backend vector store I am using for the retriever is OpenSearch:

docsearch = OpenSearchVectorSearch(endpoint, index_name="myindex", embedding_function=embeddings, 
                                   http_compress=True, http_auth=auth, use_ssl=True, verify_certs=False)

I'm still trying to sort out where the blocking call is happening that causes the application to pause until the tool is done executing.

1 reply

dosubot[bot] bot Feb 27, 2024

🤖

Based on the code you've shared, it seems like the blocking call could be happening in the await self.chain.ainvoke(query) line in your _arun function. This line is waiting for the chain.ainvoke() function to complete, which is likely making a network request or some other I/O operation that could potentially block the main thread.

The chain.ainvoke() function is an asynchronous method that executes the chain. It takes in a dictionary of inputs, an optional configuration, and additional keyword arguments. The method prepares the inputs, configures the callback manager, validates the inputs, and then asynchronously calls the _acall method to execute the chain. If an error occurs during the execution, it is handled by the on_chain_error method of the run manager. After the execution, the on_chain_end method of the run manager is called. The method then prepares the outputs and returns them. If the include_run_info keyword argument is set to True, the method also includes run information in the final outputs.

If the chain.ainvoke() function is indeed the source of the blocking, it could be due to the network request made by the HuggingFaceTextGenInference or OpenSearchVectorSearch components of your chain. Both of these components make network requests, which could potentially block the main thread if not handled properly.

To debug this issue, you could try to isolate each component of your chain and test them individually to see if they are causing the blocking. You could also use Python's built-in asyncio library to run your asynchronous code in a separate event loop, which could potentially prevent the blocking from affecting your main thread.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Drm2005 · 2024-12-17T09:26:40Z

Drm2005
Dec 17, 2024

Hello everyone,
Is it possible to implement asyncro here
https://github.com/neo4j-graphacademy/llm-chatbot-python

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async RAG Chain for Agent #17591

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

About Dosu

{{title}}

{{title}}

About Dosu

{{title}}

{{title}}

About Dosu

{{title}}

Select a reply

Async RAG Chain for Agent #17591

bwhartlove Feb 15, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

Replies: 5 comments · 1 reply

dosubot[bot] bot Feb 15, 2024

Sources

About Dosu

bwhartlove Feb 15, 2024 Author

dosubot[bot] bot Feb 15, 2024

Sources

About Dosu

bwhartlove Feb 27, 2024 Author

dosubot[bot] bot Feb 27, 2024

Sources

About Dosu

Drm2005 Dec 17, 2024

bwhartlove
Feb 15, 2024

Replies: 5 comments 1 reply

dosubot[bot]
bot Feb 15, 2024

bwhartlove
Feb 15, 2024
Author

dosubot[bot]
bot Feb 15, 2024

bwhartlove
Feb 27, 2024
Author

Drm2005
Dec 17, 2024