-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AgentQnA helm chart deploy update #837
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,15 +1,78 @@ | ||||||||||||||
# AgentQnA | ||||||||||||||
|
||||||||||||||
Helm chart for deploying AgentQnA service. | ||||||||||||||
Helm chart for deploying AgentQnA example. | ||||||||||||||
|
||||||||||||||
See [AgentQnA](https://github.com/opea-project/GenAIExamples/tree/main/AgentQnA) for the example details. | ||||||||||||||
|
||||||||||||||
Note that this is an example to demonstrate how agent works and tested with prepared data and questions. Using different datasets, models and questions may get different results. | ||||||||||||||
|
||||||||||||||
Agent usually requires larger models to performance better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment. | ||||||||||||||
|
||||||||||||||
With helm chart, we also provided option with smaller model(Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
## Deploy | ||||||||||||||
|
||||||||||||||
helm install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} --set tgi.enabled=True | ||||||||||||||
The Deployment includes preparing tools and sql data. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
### Prerequisites | ||||||||||||||
|
||||||||||||||
A volume is required to put tools configuration used by agent, and the database data used by sqlagent. | ||||||||||||||
|
||||||||||||||
We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the values.yaml. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
Create the directory /mnt/tools in the worker node, which is the default in values.yaml. We use the same directory for all 3 agents for easy configuration. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
``` | ||||||||||||||
sudo mkdir /mnt/tools | ||||||||||||||
sudo chmod 777 /mnt/tools | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
Down tools and the configuration to /mnt/tools | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
``` | ||||||||||||||
# tools used by supervisor | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/supervisor_agent_tools.yaml -O /mnt/tools/supervisor_agent_tools.yaml | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/tools.py -O /mnt/tools/tools.py | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/pycragapi.py -O /mnt/tools/pycragapi.py | ||||||||||||||
|
||||||||||||||
# tools used by rag agent | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/worker_agent_tools.yaml -O /mnt/tools/worker_agent_tools.yaml | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/worker_agent_tools.py -O /mnt/tools/worker_agent_tools.py | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
Down the sqlite data file | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
``` | ||||||||||||||
wget https://raw.githubusercontent.com/lerocha/chinook-database/refs/heads/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite -O /mnt/tools/Chinook_Sqlite.sqlite | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
### Deploy with helm chart | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
Deploy everything on Gaudi enabled kubernetes cluster: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
If you want to try with latest version, use `helm pull oci://ghcr.io/opea-project/charts/agentqna --version 0-latest --untar` | ||||||||||||||
|
||||||||||||||
``` | ||||||||||||||
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken" | ||||||||||||||
helm pull oci://ghcr.io/opea-project/charts/agentqna --untar | ||||||||||||||
helm install agentqna agentqna -f agentqna/gaudi-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
## Verify | ||||||||||||||
|
||||||||||||||
To verify the installation, run the command `kubectl get pod` to make sure all pods are running. | ||||||||||||||
|
||||||||||||||
### Ingest data for rag | ||||||||||||||
|
||||||||||||||
Ingest data used by rag. | ||||||||||||||
Comment on lines
+65
to
+67
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Abbreviations should be upper case:
Suggested change
|
||||||||||||||
|
||||||||||||||
``` | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/retrieval_tool/index_data.py -O /mnt/tools/index_data.py | ||||||||||||||
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/example_data/test_docs_music.jsonl -O /mnt/tools/test_docs_music.jsonl | ||||||||||||||
host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}') | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More exact match:
Suggested change
Or if
Suggested change
|
||||||||||||||
python3 index_data.py --filedir /mnt/tools --filename test_docs_music.jsonl --host_ip $host_ip | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
### Verify the workload through curl command | ||||||||||||||
|
||||||||||||||
Run the command `kubectl port-forward svc/agentqna-supervisor 9090:9090` to expose the service for access. | ||||||||||||||
|
@@ -20,5 +83,5 @@ Open another terminal and run the following command to verify the service if wor | |||||||||||||
curl http://localhost:9090/v1/chat/completions \ | ||||||||||||||
-X POST \ | ||||||||||||||
-H "Content-Type: application/json" \ | ||||||||||||||
-d '{"query": "Most recent album by Michael Jackson"}' | ||||||||||||||
-d '{"messages": "How many albums does Iron Maiden have?"}' | ||||||||||||||
``` |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,17 @@ | ||||||||||||
# Copyright (C) 2024 Intel Corporation | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
# SPDX-License-Identifier: Apache-2.0 | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just in case:
Suggested change
|
||||||||||||
vllm: | ||||||||||||
enabled: true | ||||||||||||
LLM_MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct" | ||||||||||||
extraCmdArgs: ["--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"] | ||||||||||||
|
||||||||||||
supervisor: | ||||||||||||
llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||||||||||||
model: "meta-llama/Meta-Llama-3-8B-Instruct" | ||||||||||||
ragagent: | ||||||||||||
llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||||||||||||
model: "meta-llama/Meta-Llama-3-8B-Instruct" | ||||||||||||
sqlagent: | ||||||||||||
llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||||||||||||
model: "meta-llama/Meta-Llama-3-8B-Instruct" | ||||||||||||
Comment on lines
+9
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add also |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6,6 +6,25 @@ | |||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just in case:
Suggested change
|
||||||||||||
tgi: | ||||||||||||
enabled: true | ||||||||||||
accelDevice: "gaudi" | ||||||||||||
image: | ||||||||||||
repository: ghcr.io/huggingface/tgi-gaudi | ||||||||||||
tag: "2.3.1" | ||||||||||||
resources: | ||||||||||||
limits: | ||||||||||||
habana.ai/gaudi: 4 | ||||||||||||
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct" | ||||||||||||
MAX_INPUT_LENGTH: 4096 | ||||||||||||
MAX_TOTAL_TOKENS: 8192 | ||||||||||||
CUDA_GRAPHS: "" | ||||||||||||
OMPI_MCA_btl_vader_single_copy_mechanism: none | ||||||||||||
PT_HPU_ENABLE_LAZY_COLLECTIVES: true | ||||||||||||
ENABLE_HPU_GRAPH: true | ||||||||||||
LIMIT_HPU_GRAPH: true | ||||||||||||
USE_FLASH_ATTENTION: true | ||||||||||||
FLASH_ATTENTION_RECOMPUTE: true | ||||||||||||
extraCmdArgs: ["--sharded", "true", "--num-shard", "4"] | ||||||||||||
|
||||||||||||
livenessProbe: | ||||||||||||
initialDelaySeconds: 5 | ||||||||||||
periodSeconds: 5 | ||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6,8 +6,19 @@ | |||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just in case:
Suggested change
|
||||||||||||
vllm: | ||||||||||||
enabled: true | ||||||||||||
accelDevice: "gaudi" | ||||||||||||
image: | ||||||||||||
repository: opea/vllm-gaudi | ||||||||||||
resources: | ||||||||||||
limits: | ||||||||||||
habana.ai/gaudi: 4 | ||||||||||||
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct" | ||||||||||||
OMPI_MCA_btl_vader_single_copy_mechanism: none | ||||||||||||
PT_HPU_ENABLE_LAZY_COLLECTIVES: true | ||||||||||||
VLLM_SKIP_WARMUP: true | ||||||||||||
shmSize: 16Gi | ||||||||||||
extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"] | ||||||||||||
|
||||||||||||
supervisor: | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add also |
||||||||||||
llm_endpoint_url: http://{{ .Release.Name }}-vllm | ||||||||||||
ragagent: | ||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,13 +20,13 @@ spec: | |
# Ingest data | ||
cd /mnt/tools | ||
pip install requests tqdm | ||
./ingest_data.sh {{ include "agentqna.fullname" (index .Subcharts "data-prep") }} | ||
python3 index_data.py --filedir /mnt/tools --filename test_docs_music.jsonl --host_ip {{ include "agentqna.fullname" (index .Subcharts "data-prep") }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
# Test ragagent | ||
max_retry=10; | ||
for ((i=1; i<=max_retry; i++)); do | ||
curl http://{{ include "agentqna.fullname" (index .Subcharts "ragagent") }}:{{ .Values.ragagent.service.port }}/v1/chat/completions -sS --fail-with-body \ | ||
-X POST \ | ||
-d '{"messages": "Tell me about Michael Jackson song Thriller"}' \ | ||
-d '{"messages": "Tell me something about Thriller"}' \ | ||
-H 'Content-Type: application/json' && break; | ||
curlcode=$? | ||
if [[ $curlcode -eq 7 ]]; then sleep 10; else echo "curl failed with code $curlcode"; exit 1; fi; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
# Accelerate inferencing in heaviest components to improve performance | ||
# by overriding their subchart values | ||
|
||
Comment on lines
+4
to
+6
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redundant comment as no accelerator devices are specified? |
||
supervisor: | ||
# OpenAI Compatible API without Authentication | ||
llm_endpoint_url: "YourEndPoint" | ||
OPENAI_API_KEY: EMPTY | ||
model: "YourModel" | ||
# Use OpenAI KEY | ||
# llm_engine: openai | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why these are commented out? |
||
# OPENAI_API_KEY: YourOpenAIKey | ||
ragagent: | ||
# OpenAI Compatible API without Authentication | ||
llm_endpoint_url: "YourEndPoint" | ||
OPENAI_API_KEY: EMPTY | ||
model: "YourModel" | ||
# Use OpenAI KEY | ||
# llm_engine: openai | ||
# OPENAI_API_KEY: YourOpenAIKey | ||
Comment on lines
+18
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These extra key comments are redundant for all of these 3 subcharts. |
||
sqlagent: | ||
# OpenAI Compatible API without Authentication | ||
llm_endpoint_url: "YourEndPoint" | ||
OPENAI_API_KEY: EMPTY | ||
model: "YourModel" | ||
# strategy: sql_agent | ||
# Use OpenAI KEY | ||
# llm_engine: openai | ||
# OPENAI_API_KEY: YourOpenAIKey |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6,6 +6,24 @@ | |||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just in case:
Suggested change
|
||||||||||||
tgi: | ||||||||||||
enabled: true | ||||||||||||
accelDevice: "gaudi" | ||||||||||||
image: | ||||||||||||
repository: ghcr.io/huggingface/tgi-gaudi | ||||||||||||
tag: "2.3.1" | ||||||||||||
resources: | ||||||||||||
limits: | ||||||||||||
habana.ai/gaudi: 4 | ||||||||||||
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct" | ||||||||||||
MAX_INPUT_LENGTH: 4096 | ||||||||||||
MAX_TOTAL_TOKENS: 8192 | ||||||||||||
CUDA_GRAPHS: "" | ||||||||||||
OMPI_MCA_btl_vader_single_copy_mechanism: none | ||||||||||||
PT_HPU_ENABLE_LAZY_COLLECTIVES: true | ||||||||||||
ENABLE_HPU_GRAPH: true | ||||||||||||
LIMIT_HPU_GRAPH: true | ||||||||||||
USE_FLASH_ATTENTION: true | ||||||||||||
FLASH_ATTENTION_RECOMPUTE: true | ||||||||||||
extraCmdArgs: ["--sharded", "true", "--num-shard", "4"] | ||||||||||||
livenessProbe: | ||||||||||||
initialDelaySeconds: 5 | ||||||||||||
periodSeconds: 5 | ||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6,6 +6,15 @@ | |||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just in case:
Suggested change
|
||||||||||||
vllm: | ||||||||||||
enabled: true | ||||||||||||
accelDevice: "gaudi" | ||||||||||||
image: | ||||||||||||
repository: opea/vllm-gaudi | ||||||||||||
resources: | ||||||||||||
limits: | ||||||||||||
habana.ai/gaudi: 4 | ||||||||||||
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct" | ||||||||||||
OMPI_MCA_btl_vader_single_copy_mechanism: none | ||||||||||||
PT_HPU_ENABLE_LAZY_COLLECTIVES: true | ||||||||||||
VLLM_SKIP_WARMUP: true | ||||||||||||
extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"] | ||||||||||||
llm_endpoint_url: http://{{ .Release.Name }}-vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
I guess it could be run (slowly) also on CPU with enough memory?