Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AgentQnA helm chart deploy update #837

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 66 additions & 3 deletions helm-charts/agentqna/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,78 @@
# AgentQnA

Helm chart for deploying AgentQnA service.
Helm chart for deploying AgentQnA example.

See [AgentQnA](https://github.com/opea-project/GenAIExamples/tree/main/AgentQnA) for the example details.

Note that this is an example to demonstrate how agent works and tested with prepared data and questions. Using different datasets, models and questions may get different results.

Agent usually requires larger models to performance better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
Agent usually requires larger models to performance better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
Agent usually requires larger models to perform better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.

I guess it could be run (slowly) also on CPU with enough memory?


With helm chart, we also provided option with smaller model(Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With helm chart, we also provided option with smaller model(Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
With helm chart, we also provided option with smaller model (Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.


## Deploy

helm install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} --set tgi.enabled=True
The Deployment includes preparing tools and sql data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Deployment includes preparing tools and sql data.
The Deployment includes preparing tools and SQL data.


### Prerequisites

A volume is required to put tools configuration used by agent, and the database data used by sqlagent.

We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the values.yaml.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the values.yaml.
We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the `values.yaml`.


Create the directory /mnt/tools in the worker node, which is the default in values.yaml. We use the same directory for all 3 agents for easy configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Create the directory /mnt/tools in the worker node, which is the default in values.yaml. We use the same directory for all 3 agents for easy configuration.
Create the directory `/mnt/tools` in the worker node, which is the default in `values.yaml`. We use the same directory for all 3 agents for easy configuration.


```
sudo mkdir /mnt/tools
sudo chmod 777 /mnt/tools
```

Down tools and the configuration to /mnt/tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Down tools and the configuration to /mnt/tools
Download tools and the configuration to `/mnt/tools`


```
# tools used by supervisor
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/supervisor_agent_tools.yaml -O /mnt/tools/supervisor_agent_tools.yaml
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/tools.py -O /mnt/tools/tools.py
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/pycragapi.py -O /mnt/tools/pycragapi.py

# tools used by rag agent
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/worker_agent_tools.yaml -O /mnt/tools/worker_agent_tools.yaml
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/worker_agent_tools.py -O /mnt/tools/worker_agent_tools.py
```

Down the sqlite data file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Down the sqlite data file
Download the `sqlite` database binary file


```
wget https://raw.githubusercontent.com/lerocha/chinook-database/refs/heads/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite -O /mnt/tools/Chinook_Sqlite.sqlite
```

### Deploy with helm chart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Deploy with helm chart
### Deploy with Helm chart


Deploy everything on Gaudi enabled kubernetes cluster:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Deploy everything on Gaudi enabled kubernetes cluster:
Deploy everything on Gaudi enabled Kubernetes cluster:


If you want to try with latest version, use `helm pull oci://ghcr.io/opea-project/charts/agentqna --version 0-latest --untar`

```
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
helm pull oci://ghcr.io/opea-project/charts/agentqna --untar
helm install agentqna agentqna -f agentqna/gaudi-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
```

## Verify

To verify the installation, run the command `kubectl get pod` to make sure all pods are running.

### Ingest data for rag

Ingest data used by rag.
Comment on lines +65 to +67
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviations should be upper case:

Suggested change
### Ingest data for rag
Ingest data used by rag.
### Ingest data for RAG
Ingest data used by RAG:


```
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/retrieval_tool/index_data.py -O /mnt/tools/index_data.py
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/example_data/test_docs_music.jsonl -O /mnt/tools/test_docs_music.jsonl
host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More exact match:

Suggested change
host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
host_ip=$(kubectl get svc -o jsonpath="{.items[].spec.clusterIP}" --selector app.kubernetes.io/name=data-prep)

Or if host_ip is supposed to include also (just) one of the possible ports:

Suggested change
host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
host_ip=$(kubectl get svc -o jsonpath="{.items[].spec.clusterIP}:{.items[].spec.ports[0].port}" --selector app.kubernetes.io/name=data-prep)

python3 index_data.py --filedir /mnt/tools --filename test_docs_music.jsonl --host_ip $host_ip
```

### Verify the workload through curl command

Run the command `kubectl port-forward svc/agentqna-supervisor 9090:9090` to expose the service for access.
Expand All @@ -20,5 +83,5 @@ Open another terminal and run the following command to verify the service if wor
curl http://localhost:9090/v1/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{"query": "Most recent album by Michael Jackson"}'
-d '{"messages": "How many albums does Iron Maiden have?"}'
```
17 changes: 17 additions & 0 deletions helm-charts/agentqna/cpu-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright (C) 2024 Intel Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2025 Intel Corporation

# SPDX-License-Identifier: Apache-2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case:

Suggested change
tgi:
enabled: false

vllm:
enabled: true
LLM_MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
extraCmdArgs: ["--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]

supervisor:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
ragagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
sqlagent:
llm_endpoint_url: http://{{ .Release.Name }}-vllm
model: "meta-llama/Meta-Llama-3-8B-Instruct"
Comment on lines +9 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also llm_engine: vllm

19 changes: 19 additions & 0 deletions helm-charts/agentqna/gaudi-tgi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case:

Suggested change
vllm:
enabled: false

tgi:
enabled: true
accelDevice: "gaudi"
image:
repository: ghcr.io/huggingface/tgi-gaudi
tag: "2.3.1"
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
MAX_INPUT_LENGTH: 4096
MAX_TOTAL_TOKENS: 8192
CUDA_GRAPHS: ""
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
extraCmdArgs: ["--sharded", "true", "--num-shard", "4"]

livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
Expand Down
11 changes: 11 additions & 0 deletions helm-charts/agentqna/gaudi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,19 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case:

Suggested change
tgi:
enabled: false

vllm:
enabled: true
accelDevice: "gaudi"
image:
repository: opea/vllm-gaudi
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
VLLM_SKIP_WARMUP: true
shmSize: 16Gi
extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]

supervisor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also llm_engine: vllm for these.

llm_endpoint_url: http://{{ .Release.Name }}-vllm
ragagent:
Expand Down
4 changes: 2 additions & 2 deletions helm-charts/agentqna/templates/tests/test-pod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ spec:
# Ingest data
cd /mnt/tools
pip install requests tqdm
./ingest_data.sh {{ include "agentqna.fullname" (index .Subcharts "data-prep") }}
python3 index_data.py --filedir /mnt/tools --filename test_docs_music.jsonl --host_ip {{ include "agentqna.fullname" (index .Subcharts "data-prep") }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.json*l*?

# Test ragagent
max_retry=10;
for ((i=1; i<=max_retry; i++)); do
curl http://{{ include "agentqna.fullname" (index .Subcharts "ragagent") }}:{{ .Values.ragagent.service.port }}/v1/chat/completions -sS --fail-with-body \
-X POST \
-d '{"messages": "Tell me about Michael Jackson song Thriller"}' \
-d '{"messages": "Tell me something about Thriller"}' \
-H 'Content-Type: application/json' && break;
curlcode=$?
if [[ $curlcode -eq 7 ]]; then sleep 10; else echo "curl failed with code $curlcode"; exit 1; fi;
Expand Down
56 changes: 15 additions & 41 deletions helm-charts/agentqna/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,50 +58,54 @@ docretriever:
tag: "latest"

sqlagent:
DBPath: "/mnt/tools"
toolHostPath: "/mnt/tools"
db_name: "Chinook"
db_path: "sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
db_path: "sqlite:////home/user/tools/Chinook_Sqlite.sqlite"
service:
port: 9096
strategy: sql_agent_llama
with_memory: false
use_hints: "false"
recursion_limit: "6"
llm_engine: vllm
llm_endpoint_url: ""
model: "meta-llama/Meta-Llama-3.1-70B-Instruct"
temperature: "0.01"
model: "meta-llama/Llama-3.3-70B-Instruct"
temperature: "0"
max_new_tokens: "4096"
stream: "false"
tools: ""
require_human_feedback: "false"

ragagent:
toolPath: "/mnt/tools"
toolHostPath: "/mnt/tools"
service:
port: 9095
strategy: rag_agent_llama
with_memory: false
recursion_limit: "6"
llm_engine: vllm
llm_endpoint_url: ""
model: "meta-llama/Meta-Llama-3.1-70B-Instruct"
temperature: "0.01"
model: "meta-llama/Llama-3.3-70B-Instruct"
temperature: "0"
max_new_tokens: "4096"
stream: "false"
tools: "/home/user/tools/worker_agent_tools.yaml"
require_human_feedback: "false"
RETRIEVAL_TOOL_URL: ""

supervisor:
toolPath: "/mnt/tools"
toolHostPath: "/mnt/tools"
service:
port: 9090
strategy: react_llama
with_memory: true
recursion_limit: 10
llm_engine: vllm
llm_endpoint_url: ""
model: "meta-llama/Meta-Llama-3.1-70B-Instruct"
temperature: "0.01"
model: "meta-llama/Llama-3.3-70B-Instruct"
temperature: "0"
max_new_tokens: "4096"
stream: "false"
stream: "true"
tools: /home/user/tools/supervisor_agent_tools.yaml
require_human_feedback: false
CRAG_SERVER: ""
Expand All @@ -119,39 +123,9 @@ crag:
# Override values in specific subcharts
tgi:
enabled: false
accelDevice: "gaudi"
image:
repository: ghcr.io/huggingface/tgi-gaudi
tag: "2.3.1"
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Meta-Llama-3.1-70B-Instruct"
MAX_INPUT_LENGTH: 4096
MAX_TOTAL_TOKENS: 8192
CUDA_GRAPHS: ""
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
extraCmdArgs: ["--sharded", "true", "--num-shard", "4"]

vllm:
enabled: false
accelDevice: "gaudi"
image:
repository: opea/vllm-gaudi
tag: "latest"
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Meta-Llama-3.1-70B-Instruct"
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
VLLM_SKIP_WARMUP: true
extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384"]

nginx:
service:
Expand Down
31 changes: 31 additions & 0 deletions helm-charts/agentqna/variant-openai-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Accelerate inferencing in heaviest components to improve performance
# by overriding their subchart values

Comment on lines +4 to +6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comment as no accelerator devices are specified?

supervisor:
# OpenAI Compatible API without Authentication
llm_endpoint_url: "YourEndPoint"
OPENAI_API_KEY: EMPTY
model: "YourModel"
# Use OpenAI KEY
# llm_engine: openai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these are commented out?

# OPENAI_API_KEY: YourOpenAIKey
ragagent:
# OpenAI Compatible API without Authentication
llm_endpoint_url: "YourEndPoint"
OPENAI_API_KEY: EMPTY
model: "YourModel"
# Use OpenAI KEY
# llm_engine: openai
# OPENAI_API_KEY: YourOpenAIKey
Comment on lines +18 to +22
Copy link
Contributor

@eero-t eero-t Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These extra key comments are redundant for all of these 3 subcharts.

sqlagent:
# OpenAI Compatible API without Authentication
llm_endpoint_url: "YourEndPoint"
OPENAI_API_KEY: EMPTY
model: "YourModel"
# strategy: sql_agent
# Use OpenAI KEY
# llm_engine: openai
# OPENAI_API_KEY: YourOpenAIKey
16 changes: 9 additions & 7 deletions helm-charts/common/agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,12 @@ curl http://localhost:9090/v1/chat/completions \

For global options, see Global Options.

| Key | Type | Default | Description |
| ------------------------------- | ------ | -------------- | ------------------------------- |
| global.HUGGINGFACEHUB_API_TOKEN | string | `""` | Your own Hugging Face API token |
| image.repository | string | `"opea/agent"` | |
| service.port | string | `"9090"` | |
| llm_endpoint_url | string | `""` | LLM endpoint |
| global.monitoring | bop; | false | Service usage metrics |
| Key | Type | Default | Description |
| ------------------------------- | ------ | -------------- | --------------------------------------------------------------------------------------- |
| global.HUGGINGFACEHUB_API_TOKEN | string | `""` | Your own Hugging Face API token |
| image.repository | string | `"opea/agent"` | |
| service.port | string | `"9090"` | |
| llm_endpoint_url | string | `""` | LLM endpoint |
| toolHostPath | string | `""` | hostPath to be mounted to agent's /home/user/tools, used for passing files for tools |
| toolPVC | string | `""` | Same as toolHostPath, but use PVC. You can only specify one of toolHostPath and toolPVC |
| global.monitoring | bop; | false | Service usage metrics |
18 changes: 18 additions & 0 deletions helm-charts/common/agent/gaudi-tgi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,24 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case:

Suggested change
vllm:
enabled: false

tgi:
enabled: true
accelDevice: "gaudi"
image:
repository: ghcr.io/huggingface/tgi-gaudi
tag: "2.3.1"
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
MAX_INPUT_LENGTH: 4096
MAX_TOTAL_TOKENS: 8192
CUDA_GRAPHS: ""
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
ENABLE_HPU_GRAPH: true
LIMIT_HPU_GRAPH: true
USE_FLASH_ATTENTION: true
FLASH_ATTENTION_RECOMPUTE: true
extraCmdArgs: ["--sharded", "true", "--num-shard", "4"]
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
Expand Down
9 changes: 9 additions & 0 deletions helm-charts/common/agent/gaudi-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,15 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case:

Suggested change
tgi:
enabled: false

vllm:
enabled: true
accelDevice: "gaudi"
image:
repository: opea/vllm-gaudi
resources:
limits:
habana.ai/gaudi: 4
LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
OMPI_MCA_btl_vader_single_copy_mechanism: none
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
VLLM_SKIP_WARMUP: true
extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
llm_endpoint_url: http://{{ .Release.Name }}-vllm
1 change: 1 addition & 0 deletions helm-charts/common/agent/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ data:
recursion_limit: {{ .Values.recursion_limit | quote }}
llm_engine: {{ .Values.llm_engine | quote }}
strategy: {{ .Values.strategy | quote }}
with_memory: {{ .Values.with_memory |quote }}
use_hints: {{ .Values.use_hints | quote }}
max_new_tokens: {{ .Values.max_new_tokens | quote }}
{{- if .Values.OPENAI_API_KEY }}
Expand Down
Loading