How to Reduce Latency When Using Two Agents with Multiple LLM-Based Tools for SQL Query Generation and Response Handling in Langchain? #27509

KiraKmp · 2024-10-21T12:13:17Z

KiraKmp
Oct 21, 2024

I'm working on a Langchain project where I use two separate agents, each leveraging multiple LLM-based tools to handle SQL query generation and user response. While the workflow functions as expected, I'm encountering significant latency and would appreciate some feedback on how to optimize it.

Problem Context:

First Agent (SQL Query Generation):

This agent generates an SQL query based on the user's input. It uses multiple tools that each involve separate LLM calls for performing different tasks:

entity_separation_tool: Uses an LLM to extract entities (e.g., categories, contacts, and banks) from the user's question.

table_selection_tool: Uses an LLM to select relevant database tables based on the extracted entities and the user's input.

sql_generation_tool: Uses an LLM to generate an SQL query based on the selected tables and entities.

SQL Validation:

After the query is generated by the first agent, it is passed through a custom SQL validation function (validate_and_execute_sql) that also involves model-based processing. This function generates important metadata, such as:

isValidQuery: Whether the query is valid.

canPassFullResults: If the query result is too large (exceeding token limits), this flag determines whether the full result can be passed.

Second Agent (User-Friendly Response Generation):

Based on the metadata generated from the SQL validation step, I use a second agent to generate a user-friendly response. This agent involves tools like:

sql_reply_tool: Uses an LLM to format the SQL result into a natural language response.

user_reply_tool: Uses an LLM to generate a response to the user based on the validation metadata.

A separate agent is used here because some SQL queries could return large datasets, and using an LLM directly to handle large results might exceed token limits.

Here’s a simplified version of the workflow:

//First agent for generating SQL query
query_generator_agent = create_react_agent(llm, query_generator_tools, query_generator_prompt)

//Execute first agent
query_generator_res = query_generator_agent_executor.invoke({"input": user_prompt})

//Validate SQL query and gather metadata
sql_generated_result = validate_and_execute_sql(data=query_generator_res, companies_id=companies_id, accounts_id=accounts_id)

//Prepare metadata for second agent
json_string = json.dumps(sql_generated_result)

//Second agent for generating user-friendly response
user_reply_agent = create_react_agent(llm, user_reply_tools, user_reply_prompt)

// Execute second agent using the metadata
user_reply_res = user_reply_agent_executor.invoke({"input": json_string})

Challenges:

Latency: Each tool involves an LLM model for specific tasks (e.g., entity separation, table selection, SQL generation, and response formatting), which leads to multiple LLM calls across the two agents. This seems to be contributing to the high latency.

The SQL validation step is necessary to avoid passing large results to the second agent, but it introduces additional overhead.

Questions:

Is using two separate agents with multiple LLM-based tools the best approach, or would there be a better way to handle large SQL result sets without exceeding token limits?
How can I optimize the SQL validation step to reduce latency, considering that it involves model-based processing and must prevent large datasets from being passed to the LLM in the second agent?
Would combining the agents or optimizing the number of LLM calls (e.g., reducing the number of tools or agent executions) help in minimizing the latency?
What are the best practices for handling large SQL result sets in Langchain, especially when using multiple LLM-based tools across agents, without exceeding token limits?
Would asynchronous execution or parallel processing of the tools (where possible) reduce latency, or is this inherently a sequential process due to the dependencies between the tools?

Additional Details:

Each tool in both agents involves a model for tasks such as entity extraction, table selection, SQL query generation, and response generation.

The PydanticOutputParser is used to format the output from the LLM.

The custom SQL validation function is essential to prevent large datasets from overwhelming the second agent.

Any suggestions on how I can improve performance and reduce the overall latency in this multi-agent, multi-tool setup would be greatly appreciated!

Thank you in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Reduce Latency When Using Two Agents with Multiple LLM-Based Tools for SQL Query Generation and Response Handling in Langchain? #27509

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to Reduce Latency When Using Two Agents with Multiple LLM-Based Tools for SQL Query Generation and Response Handling in Langchain? #27509

KiraKmp Oct 21, 2024

Replies: 0 comments

KiraKmp
Oct 21, 2024