Skip to content

Commit

Permalink
Feature(ai service): refine ask/ask details pipelines (#64)
Browse files Browse the repository at this point in the history
* allow visualization using one file

* remove default argparse option value

* resolve conflict

* resolve conflict

* resolve conflict

* update setup instructions

* resolve conflict

* resolve conflicts

* resolve conflict

* resolve conflict

* resolve conflict

* add anthropic model pricing

* allow change model for anthropic

* resolve conflict

* resolve conflict

* resovle conflict

* resovle conflict

* resolve conflict

* resolve conflict

* add container:ubuntu

* undo

* resolve conflict

* resolve conflict

* resolve conflict

* resolve conflict

* resolve conflict

* update

* resolve conflict

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* resolve conflicts

* remove unused files

* resolve conflict

* resolve conflict

* generate ddl from mdl for indexing

* resolve conflict

* resolve conflicts

* eval ask with semantic description (#45)

* update make eval-ask

* add semantic description to mdl before eval

* refine

* add model semantic generation

* simplify make command

* print timestamp

* minor update

---------

Co-authored-by: qa <[email protected]>
Co-authored-by: ChihYu Yeh <[email protected]>
Co-authored-by: Aster Sun <[email protected]>

* remove ;

* simplify prompt

* revert topk

* update prompt

* make default top_k for retriever 10

* generate multiple results for ask pipeline

* resolve conflict

* add followup_generation_pipeline

* add logging for ask api results

* fix generate_mdl issue

* fix ask_details error message

* add extra standard package for uvicorn

* refine ask

* fix postprocessor for ask details

* fix preview data

* fix test

* update prompts

* refine prompts for ask and ask details

* fix prompt

* refine prompt

* remove slash at the end of api endpoints

* refine prompt

* add custom semantic run option

* resolve conflict

* fix bugs for tests and eval and restructure eval-ask

* fix trainling slash and add logging

* remove unused code

* restructure eval ask pipeline outputs

* add comments to the eval command

* change generation component argument name

* add try/except to handle other errors that might happen

* change WREN_AI_SERVICE_VERSION to nightly

* fix eval ask_details errors

* add error message to indexing

---------

Co-authored-by: imAsterSun <[email protected]>
Co-authored-by: qa <[email protected]>
Co-authored-by: Aster Sun <[email protected]>
Co-authored-by: Pao Sheng <[email protected]>
  • Loading branch information
5 people authored Apr 8, 2024
1 parent b677dd1 commit 9507838
Show file tree
Hide file tree
Showing 37 changed files with 1,603 additions and 1,527 deletions.
2 changes: 1 addition & 1 deletion docker/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ WREN_AI_SERVICE_PORT=5555
# version
# CHANGE THIS TO THE LATEST VERSION
WREN_ENGINE_VERSION=nightly
WREN_AI_SERVICE_VERSION=dev
WREN_AI_SERVICE_VERSION=nightly
WREN_UI_VERSION=0.1.0
WREN_BOOTSTRAP_VERSION=0.1.0

Expand Down
5 changes: 1 addition & 4 deletions wren-ai-service/.env.dev.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,8 @@ WREN_AI_SERVICE_PORT=5555

# app related
QDRANT_HOST=localhost
OPENAI_API_KEY=
LANGFUSE_PUBLIC_KEY=
LANGFUSE_SECRET_KEY=
ENABLE_TRACE=
WREN_ENGINE_ENDPOINT=http://localhost:8080
OPENAI_API_KEY=

# evaluation related
DATASET_NAME=book_2
24 changes: 11 additions & 13 deletions wren-ai-service/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,6 @@ run-qdrant:
stop-qdrant:
docker stop qdrant && docker rm qdrant

# present the evaluation result on the streamlit app
# example: make streamlit pipeline=src/eval/streamlit_app.py
streamlit:
poetry run streamlit run $(pipeline)

# example: make eval pipeline=ask_details
eval:
poetry run python -m src.eval.$(pipeline) $(args)

run-wren-engine:
docker compose -f ./src/eval/wren-engine/docker-compose.yml --env-file ./src/eval/wren-engine/.env up -d

Expand All @@ -50,14 +41,21 @@ stop-all:
make stop-qdrant && \
make stop-wren-engine

eval-ask:
# present the evaluation result on the streamlit app
# example: make streamlit pipeline=ask_details
streamlit:
poetry run streamlit run src/eval/${pipeline}/streamlit_app.py

# example: make eval pipeline=ask_details
# example: make eval pipeline=ask args="--help" to check all available arguments
eval:
make run-all && \
poetry run python -m src.eval.ask --eval-from-scratch --eval-after-prediction && \
poetry run python -m src.eval.$(pipeline) $(args)
make stop-all

test:
poetry run python -m src.prepare_mdl_json --dataset_name book_2 && \
make run-qdrant && \
make run-wren-engine && \
poetry run pytest -s && \
make stop-all
poetry run pytest -s $(args) && \
make stop-all
1 change: 1 addition & 0 deletions wren-ai-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

## Pipeline Evaluation(for development)

- install `psql`
- fill in environment variables: `.env.dev` in the src folder and `config.properties` in the src/eval/wren-engine/etc folder
- start the docker service
- run qdrant and wren-engine docker containers: `make run-all`
Expand Down
97 changes: 46 additions & 51 deletions wren-ai-service/demo/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,39 +336,31 @@ def _parse_table_definition(
}
)
else:
should_add_column = True
if "PRIMARY KEY" in part or "primary key" in part:
if "(" not in part and ")" not in part:
primary_key = part.strip().split(" ")[0]
part = (
part.replace("PRIMARY KEY", "")
.replace("primary key", "")
.strip()
)
else:
should_add_column = False
pattern = r'\("(.*?)"\)'
if matches := re.findall(pattern, part):
primary_key = matches[0]
primary_key = part.strip().split(" ")[0]
part = (
part.replace("PRIMARY KEY", "")
.replace("primary key", "")
.strip()
)

# Splitting the column name and type
if should_add_column:
column_def = _parse_column_definition(part.strip())
column_def = _parse_column_definition(part.strip())

columns.append(
{
"name": column_def["name"].replace('"', ""),
"type": _get_appropriat_column_type(column_def["type"]),
"notNull": column_def[
"not_null"
], # Assuming notNull is False by default as not specified in the string
"isCalculated": False, # Assuming isCalculated is False by default
"expression": column_def["name"].replace(
'"', ""
), # Assuming expression is the column name itself
"properties": {},
}
)
columns.append(
{
"name": column_def["name"].replace('"', ""),
"type": _get_appropriat_column_type(column_def["type"]),
"notNull": column_def[
"not_null"
], # Assuming notNull is False by default as not specified in the string
"isCalculated": False, # Assuming isCalculated is False by default
"expression": column_def["name"].replace(
'"', ""
), # Assuming expression is the column name itself
"properties": {},
}
)

if relationships:
for relationship in relationships:
Expand Down Expand Up @@ -626,22 +618,18 @@ def show_asks_details_results():
for i, step in enumerate(st.session_state["asks_details_result"]["steps"]):
st.markdown(f"#### Step {i + 1}")
st.markdown(step["summary"])
if i != len(st.session_state["asks_details_result"]["steps"]) - 1:
st.code(
body=step["sql"],
language="sql",
)
sqls_with_cte.append(
"WITH " + step["cte_name"] + " AS (" + step["sql"] + ")"
)
sqls.append(step["sql"])
else:
last_step_sql = "\n".join(sqls_with_cte) + "\n\n" + step["sql"]
sqls.append(last_step_sql)
st.code(
body=last_step_sql,
language="sql",
)

sql = ""
if sqls_with_cte:
sql += "WITH " + ",\n".join(sqls_with_cte) + "\n\n"
sql += step["sql"]
sqls.append(sql)

st.code(
body=sql,
language="sql",
)
sqls_with_cte.append(f"{step['cte_name']} AS ( {step['sql']} )")

st.button(
label="Preview Data",
Expand Down Expand Up @@ -679,7 +667,7 @@ def generate_mdl_metadata(mdl_model_json: dict):

st.toast(f'Generating MDL metadata for model {mdl_model_json['name']}', icon="⏳")
generate_mdl_metadata_response = requests.post(
f"{WREN_AI_SERVICE_BASE_URL}/v1/semantics-descriptions/",
f"{WREN_AI_SERVICE_BASE_URL}/v1/semantics-descriptions",
json={
"mdl": mdl_model_json,
"model": mdl_model_json["name"],
Expand Down Expand Up @@ -708,7 +696,7 @@ def generate_mdl_metadata(mdl_model_json: dict):

def prepare_semantics(mdl_json: dict):
semantics_preparation_response = requests.post(
f"{WREN_AI_SERVICE_BASE_URL}/v1/semantics-preparations/",
f"{WREN_AI_SERVICE_BASE_URL}/v1/semantics-preparations",
json={
"mdl": json.dumps(mdl_json),
"id": st.session_state["deployment_id"],
Expand Down Expand Up @@ -750,7 +738,7 @@ def prepare_semantics(mdl_json: dict):
def ask(query: str, query_history: Optional[dict] = None):
st.session_state["query"] = query
asks_response = requests.post(
f"{WREN_AI_SERVICE_BASE_URL}/v1/asks/",
f"{WREN_AI_SERVICE_BASE_URL}/v1/asks",
json={
"query": query,
"id": st.session_state["deployment_id"],
Expand All @@ -768,7 +756,7 @@ def ask(query: str, query_history: Optional[dict] = None):
and asks_status != "stopped"
):
asks_status_response = requests.get(
f"{WREN_AI_SERVICE_BASE_URL}/v1/asks/{query_id}/result/"
f"{WREN_AI_SERVICE_BASE_URL}/v1/asks/{query_id}/result"
)
assert asks_status_response.status_code == 200
asks_status = asks_status_response.json()["status"]
Expand All @@ -786,7 +774,7 @@ def ask(query: str, query_history: Optional[dict] = None):

def ask_details():
asks_details_response = requests.post(
f"{WREN_AI_SERVICE_BASE_URL}/v1/ask-details/",
f"{WREN_AI_SERVICE_BASE_URL}/v1/ask-details",
json={
"query": st.session_state["chosen_query_result"]["query"],
"sql": st.session_state["chosen_query_result"]["sql"],
Expand All @@ -798,7 +786,9 @@ def ask_details():
query_id = asks_details_response.json()["query_id"]
asks_details_status = None

while not asks_details_status or asks_details_status != "finished":
while (
asks_details_status != "finished" and asks_details_status != "failed"
) or not asks_details_status:
asks_details_status_response = requests.get(
f"{WREN_AI_SERVICE_BASE_URL}/v1/ask-details/{query_id}/result/"
)
Expand All @@ -811,3 +801,8 @@ def ask_details():
st.session_state["asks_details_result"] = asks_details_status_response.json()[
"response"
]
elif asks_details_status == "failed":
st.error(
f'An error occurred while processing the query: {asks_details_status_response.json()['error']}',
icon="🚨",
)
3 changes: 1 addition & 2 deletions wren-ai-service/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,14 @@ readme = "README.md"
[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.109.2"
uvicorn = "^0.27.1"
uvicorn = {extras = ["standard"], version = "^0.29.0"}
python-dotenv = "^1.0.1"
haystack-ai = "^2.0.0"
openai = "^1.14.0"
qdrant-haystack = "^3.0.0"
backoff = "^2.2.1"
tqdm = "^4.66.2"
numpy = "^1.26.4"
langfuse = "^2.19.1"

[tool.poetry.group.dev.dependencies]
pytest = "^8.0.0"
Expand Down
24 changes: 7 additions & 17 deletions wren-ai-service/src/core/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,15 @@ def __init__(self, pipe: Pipeline):
def run(self, *args, **kwargs) -> Dict[str, Any]:
...

def save(self, with_trace: bool = False, suffix: str = None) -> Path:
def save(self, suffix: str = None) -> Path:
if suffix:
if with_trace:
file_path = Path(
f"./outputs/{self.__class__.__name__.lower()}_pipeline_with_trace_{suffix}.yaml"
)
else:
file_path = Path(
f"./outputs/{self.__class__.__name__.lower()}_pipeline_{suffix}.yaml"
)
file_path = Path(
f"./outputs/{self.__class__.__name__.lower()}_pipeline_{suffix}.yaml"
)
else:
if with_trace:
file_path = Path(
f"./outputs/{self.__class__.__name__.lower()}_pipeline_with_trace.yaml"
)
else:
file_path = Path(
f"./outputs/{self.__class__.__name__.lower()}_pipeline.yaml"
)
file_path = Path(
f"./outputs/{self.__class__.__name__.lower()}_pipeline.yaml"
)

with open(file_path, "w") as file:
self._pipe.dump(file)
Expand Down
Empty file.
Loading

0 comments on commit 9507838

Please sign in to comment.