Replies: 1 comment 4 replies
-
Yes, there is a sample 'workflow code' to implement the current pandas query pipeline in Llama-index 0.11. Here is the relevant code: %pip install llama-index-llms-openai llama-index-experimental from llama_index.core.query_pipeline import (
QueryPipeline as QP,
Link,
InputComponent,
)
from llama_index.experimental.query_engine.pandas import (
PandasInstructionParser,
)
from llama_index.llms.openai import OpenAI
from llama_index.core import PromptTemplate Download DataHere we load the Titanic CSV dataset. !wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/docs/examples/data/csv/titanic_train.csv' -O 'titanic_train.csv' import pandas as pd
df = pd.read_csv("./titanic_train.csv") Define Modulesinstruction_str = (
"1. Convert the query to executable Python code using Pandas.\n"
"2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
"3. The code should represent a solution to the query.\n"
"4. PRINT ONLY THE EXPRESSION.\n"
"5. Do not quote the expression.\n"
)
pandas_prompt_str = (
"You are working with a pandas dataframe in Python.\n"
"The name of the dataframe is `df`.\n"
"This is the result of `print(df.head())`:\n"
"{df_str}\n\n"
"Follow these instructions:\n"
"{instruction_str}\n"
"Query: {query_str}\n\n"
"Expression:"
)
response_synthesis_prompt_str = (
"Given an input question, synthesize a response from the query results.\n"
"Query: {query_str}\n\n"
"Pandas Instructions (optional):\n{pandas_instructions}\n\n"
"Pandas Output: {pandas_output}\n\n"
"Response: "
)
pandas_prompt = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df_str=df.head(5)
)
pandas_output_parser = PandasInstructionParser(df)
response_synthesis_prompt = PromptTemplate(response_synthesis_prompt_str)
llm = OpenAI(model="gpt-3.5-turbo") Build Query Pipelineqp = QP(
modules={
"input": InputComponent(),
"pandas_prompt": pandas_prompt,
"llm1": llm,
"pandas_output_parser": pandas_output_parser,
"response_synthesis_prompt": response_synthesis_prompt,
"llm2": llm,
},
verbose=True,
)
qp.add_chain(["input", "pandas_prompt", "llm1", "pandas_output_parser"])
qp.add_links(
[
Link("input", "response_synthesis_prompt", dest_key="query_str"),
Link(
"llm1", "response_synthesis_prompt", dest_key="pandas_instructions"
),
Link(
"pandas_output_parser",
"response_synthesis_prompt",
dest_key="pandas_output",
),
]
)
# add link from response synthesis prompt to llm2
qp.add_link("response_synthesis_prompt", "llm2") Run Queryresponse = qp.run(
query_str="What is the correlation between survival and age?",
) print(response.message.content) Additionally, you can refer to the def test_pandas_query_engine(mock_service_context: ServiceContext) -> None:
"""Test pandas query engine."""
df = pd.DataFrame(
{
"city": ["Toronto", "Tokyo", "Berlin"],
"population": [2930000, 13960000, 3645000],
"description": [
"""Toronto, Canada's largest city, is a vibrant and diverse metropolis situated in the province of Ontario.
Known for its iconic skyline featuring the CN Tower, Toronto is a cultural melting pot with a rich blend of communities, languages, and cuisines.
It boasts a thriving arts scene, world-class museums, and a strong economic hub.
Visitors can explore historic neighborhoods, such as Kensington Market and Distillery District, or enjoy beautiful natural surroundings on Toronto Islands.
With its welcoming atmosphere, top-notch education, and multicultural charm, Toronto is a global destination for both tourists and professionals alike.""",
"A city",
"Another City",
],
}
)
query_engine = PandasQueryEngine(
df, service_context=mock_service_context, verbose=True
)
response = query_engine.query(QueryBundle("population"))
assert str(response) == str(df["population"])
metadata = cast(Dict[str, Any], response.metadata)
assert metadata["pandas_instruction_str"] == ('df["population"]') This example demonstrates how to set up and test the PandasQueryEngine with a sample DataFrame [1][2]. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am so thrilled to explore Llama-index 0.11.
Here my first objective is to move my pandas' query pipe line to workflow since query pipelines will not be supported in new versions. Do we have any sample 'workflow code' to implement the current pandas query pipeline? Just like the below example... Thank you.
https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_pandas/
Beta Was this translation helpful? Give feedback.
All reactions