opea-project · yongfengdu · Feb 13, 2025 · eero-t · Feb 27, 2025 · eero-t
@@ -1,15 +1,78 @@
 # AgentQnA
 
-Helm chart for deploying AgentQnA service.
+Helm chart for deploying AgentQnA example.
+
+See [AgentQnA](https://github.com/opea-project/GenAIExamples/tree/main/AgentQnA) for the example details.
+
+Note that this is an example to demonstrate how agent works and tested with prepared data and questions. Using different datasets, models and questions may get different results.
+
+Agent usually requires larger models to performance better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
-Agent usually requires larger models to performance better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
+Agent usually requires larger models to perform better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
-Agent usually requires larger models to performance better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
+Agent usually requires larger models to perform better, we used Llama-3.3-70B-Instruct for test, which requires 4x Gaudi devices for local deployment.
+
+With helm chart, we also provided option with smaller model(Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
-With helm chart, we also provided option with smaller model(Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
+With helm chart, we also provided option with smaller model (Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
-With helm chart, we also provided option with smaller model(Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
+With helm chart, we also provided option with smaller model (Meta-Llama-3-8B-Instruct) with compromised performance on Xeon CPU only environment for you to try.
 
 ## Deploy
 
-helm install agentqna oci://ghcr.io/opea-project/charts/agentqna --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} --set tgi.enabled=True
+The Deployment includes preparing tools and sql data.
-The Deployment includes preparing tools and sql data.
+The Deployment includes preparing tools and SQL data.
-The Deployment includes preparing tools and sql data.
+The Deployment includes preparing tools and SQL data.
+
+### Prerequisites
+
+A volume is required to put tools configuration used by agent, and the database data used by sqlagent.
+
+We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the values.yaml.
-We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the values.yaml.
+We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the `values.yaml`.
-We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the values.yaml.
+We'll use hostPath in this readme, which is convenient for single worker node deployment. PVC is recommended in a bigger cluster. If you want to use a PVC, comment out the `toolHostPath` and replace with `toolPVC` in the `values.yaml`.
+
+Create the directory /mnt/tools in the worker node, which is the default in values.yaml. We use the same directory for all 3 agents for easy configuration.
-Create the directory /mnt/tools in the worker node, which is the default in values.yaml. We use the same directory for all 3 agents for easy configuration.
+Create the directory `/mnt/tools` in the worker node, which is the default in `values.yaml`. We use the same directory for all 3 agents for easy configuration.
-Create the directory /mnt/tools in the worker node, which is the default in values.yaml. We use the same directory for all 3 agents for easy configuration.
+Create the directory `/mnt/tools` in the worker node, which is the default in `values.yaml`. We use the same directory for all 3 agents for easy configuration.
+
+```
+sudo mkdir /mnt/tools
+sudo chmod 777 /mnt/tools
+```
+
+Down tools and the configuration to /mnt/tools
-Down tools and the configuration to /mnt/tools
+Download tools and the configuration to `/mnt/tools`
-Down tools and the configuration to /mnt/tools
+Download tools and the configuration to `/mnt/tools`
+
+```
+# tools used by supervisor
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/supervisor_agent_tools.yaml -O /mnt/tools/supervisor_agent_tools.yaml
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/tools.py -O /mnt/tools/tools.py
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/pycragapi.py -O /mnt/tools/pycragapi.py
+
+# tools used by rag agent
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/worker_agent_tools.yaml -O /mnt/tools/worker_agent_tools.yaml
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/tools/worker_agent_tools.py -O /mnt/tools/worker_agent_tools.py
+```
+
+Down the sqlite data file
-Down the sqlite data file
+Download the `sqlite` database binary file
-Down the sqlite data file
+Download the `sqlite` database binary file
+
+```
+wget https://raw.githubusercontent.com/lerocha/chinook-database/refs/heads/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite -O /mnt/tools/Chinook_Sqlite.sqlite
+```
+
+### Deploy with helm chart
-### Deploy with helm chart
+### Deploy with Helm chart
-### Deploy with helm chart
+### Deploy with Helm chart
+
+Deploy everything on Gaudi enabled kubernetes cluster:
-Deploy everything on Gaudi enabled kubernetes cluster:
+Deploy everything on Gaudi enabled Kubernetes cluster:
-Deploy everything on Gaudi enabled kubernetes cluster:
+Deploy everything on Gaudi enabled Kubernetes cluster:
+
+If you want to try with latest version, use `helm pull oci://ghcr.io/opea-project/charts/agentqna --version 0-latest --untar`
+
+```
+export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
+helm pull oci://ghcr.io/opea-project/charts/agentqna --untar
+helm install agentqna agentqna -f agentqna/gaudi-values.yaml --set global.HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
+```
 
 ## Verify
 
 To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
 
+### Ingest data for rag
+
+Ingest data used by rag.
-### Ingest data for rag
-
-Ingest data used by rag.
+### Ingest data for RAG
+
+Ingest data used by RAG:
-### Ingest data for rag
-
-Ingest data used by rag.
+### Ingest data for RAG
+
+Ingest data used by RAG:
+
+```
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/retrieval_tool/index_data.py -O /mnt/tools/index_data.py
+wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/AgentQnA/example_data/test_docs_music.jsonl -O /mnt/tools/test_docs_music.jsonl
+host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
-host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
+host_ip=$(kubectl get svc -o jsonpath="{.items[].spec.clusterIP}" --selector app.kubernetes.io/name=data-prep)
-host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
+host_ip=$(kubectl get svc -o jsonpath="{.items[].spec.clusterIP}:{.items[].spec.ports[0].port}" --selector app.kubernetes.io/name=data-prep)
-host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
+host_ip=$(kubectl get svc -o jsonpath="{.items[].spec.clusterIP}" --selector app.kubernetes.io/name=data-prep)
-host_ip=$(kubectl get svc |grep data-prep |awk '{print $3}')
+host_ip=$(kubectl get svc -o jsonpath="{.items[].spec.clusterIP}:{.items[].spec.ports[0].port}" --selector app.kubernetes.io/name=data-prep)
+python3 index_data.py --filedir /mnt/tools --filename test_docs_music.jsonl --host_ip $host_ip
+```
+
 ### Verify the workload through curl command
 
 Run the command `kubectl port-forward svc/agentqna-supervisor 9090:9090` to expose the service for access.
@@ -20,5 +83,5 @@ Open another terminal and run the following command to verify the service if wor
 curl http://localhost:9090/v1/chat/completions \
     -X POST \
     -H "Content-Type: application/json" \
-    -d '{"query": "Most recent album by Michael Jackson"}'
+    -d '{"messages": "How many albums does Iron Maiden have?"}'
 ```
@@ -0,0 +1,17 @@
+# Copyright (C) 2024 Intel Corporation
-# Copyright (C) 2024 Intel Corporation
+# Copyright (C) 2025 Intel Corporation
-# Copyright (C) 2024 Intel Corporation
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
-
+
+tgi:
+  enabled: false
+
-
+
+tgi:
+  enabled: false
+
+vllm:
+  enabled: true
+  LLM_MODEL_ID: "meta-llama/Meta-Llama-3-8B-Instruct"
+  extraCmdArgs: ["--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
+
+supervisor:
+  llm_endpoint_url: http://{{ .Release.Name }}-vllm
+  model: "meta-llama/Meta-Llama-3-8B-Instruct"
+ragagent:
+  llm_endpoint_url: http://{{ .Release.Name }}-vllm
+  model: "meta-llama/Meta-Llama-3-8B-Instruct"
+sqlagent:
+  llm_endpoint_url: http://{{ .Release.Name }}-vllm
+  model: "meta-llama/Meta-Llama-3-8B-Instruct"
@@ -6,6 +6,25 @@
 
-
+
+vllm:
+  enabled: false
+
-
+
+vllm:
+  enabled: false
+
 tgi:
   enabled: true
+  accelDevice: "gaudi"
+  image:
+    repository: ghcr.io/huggingface/tgi-gaudi
+    tag: "2.3.1"
+  resources:
+    limits:
+      habana.ai/gaudi: 4
+  LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
+  MAX_INPUT_LENGTH: 4096
+  MAX_TOTAL_TOKENS: 8192
+  CUDA_GRAPHS: ""
+  OMPI_MCA_btl_vader_single_copy_mechanism: none
+  PT_HPU_ENABLE_LAZY_COLLECTIVES: true
+  ENABLE_HPU_GRAPH: true
+  LIMIT_HPU_GRAPH: true
+  USE_FLASH_ATTENTION: true
+  FLASH_ATTENTION_RECOMPUTE: true
+  extraCmdArgs: ["--sharded", "true", "--num-shard", "4"]
+
   livenessProbe:
     initialDelaySeconds: 5
     periodSeconds: 5

@@ -6,8 +6,19 @@
 
-
+
+tgi:
+  enabled: false
+
-
+
+tgi:
+  enabled: false
+
 vllm:
   enabled: true
+  accelDevice: "gaudi"
   image:
     repository: opea/vllm-gaudi
+  resources:
+    limits:
+      habana.ai/gaudi: 4
+  LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
+  OMPI_MCA_btl_vader_single_copy_mechanism: none
+  PT_HPU_ENABLE_LAZY_COLLECTIVES: true
+  VLLM_SKIP_WARMUP: true
+  shmSize: 16Gi
+  extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
+
 supervisor:
   llm_endpoint_url: http://{{ .Release.Name }}-vllm
 ragagent:

@@ -20,13 +20,13 @@ spec:
           # Ingest data
           cd /mnt/tools
           pip install requests tqdm
-          ./ingest_data.sh {{ include "agentqna.fullname" (index .Subcharts "data-prep") }}
+          python3 index_data.py --filedir /mnt/tools --filename test_docs_music.jsonl --host_ip {{ include "agentqna.fullname" (index .Subcharts "data-prep") }}
           # Test ragagent
           max_retry=10;
           for ((i=1; i<=max_retry; i++)); do
             curl http://{{ include "agentqna.fullname" (index .Subcharts "ragagent") }}:{{ .Values.ragagent.service.port }}/v1/chat/completions -sS --fail-with-body \
             -X POST \
-            -d '{"messages": "Tell me about Michael Jackson song Thriller"}' \
+            -d '{"messages": "Tell me something about Thriller"}' \
             -H 'Content-Type: application/json' && break;
             curlcode=$?
             if [[ $curlcode -eq 7 ]]; then sleep 10; else echo "curl failed with code $curlcode"; exit 1; fi;

@@ -58,50 +58,54 @@ docretriever:
     tag: "latest"
 
 sqlagent:
-  DBPath: "/mnt/tools"
+  toolHostPath: "/mnt/tools"
   db_name: "Chinook"
-  db_path: "sqlite:////home/user/chinook-db/Chinook_Sqlite.sqlite"
+  db_path: "sqlite:////home/user/tools/Chinook_Sqlite.sqlite"
   service:
     port: 9096
   strategy: sql_agent_llama
+  with_memory: false
   use_hints: "false"
   recursion_limit: "6"
   llm_engine: vllm
   llm_endpoint_url: ""
-  model: "meta-llama/Meta-Llama-3.1-70B-Instruct"
-  temperature: "0.01"
+  model: "meta-llama/Llama-3.3-70B-Instruct"
+  temperature: "0"
   max_new_tokens: "4096"
   stream: "false"
+  tools: ""
   require_human_feedback: "false"
 
 ragagent:
-  toolPath: "/mnt/tools"
+  toolHostPath: "/mnt/tools"
   service:
     port: 9095
   strategy: rag_agent_llama
+  with_memory: false
   recursion_limit: "6"
   llm_engine: vllm
   llm_endpoint_url: ""
-  model: "meta-llama/Meta-Llama-3.1-70B-Instruct"
-  temperature: "0.01"
+  model: "meta-llama/Llama-3.3-70B-Instruct"
+  temperature: "0"
   max_new_tokens: "4096"
   stream: "false"
   tools: "/home/user/tools/worker_agent_tools.yaml"
   require_human_feedback: "false"
   RETRIEVAL_TOOL_URL: ""
 
 supervisor:
-  toolPath: "/mnt/tools"
+  toolHostPath: "/mnt/tools"
   service:
     port: 9090
   strategy: react_llama
+  with_memory: true
   recursion_limit: 10
   llm_engine: vllm
   llm_endpoint_url: ""
-  model: "meta-llama/Meta-Llama-3.1-70B-Instruct"
-  temperature: "0.01"
+  model: "meta-llama/Llama-3.3-70B-Instruct"
+  temperature: "0"
   max_new_tokens: "4096"
-  stream: "false"
+  stream: "true"
   tools: /home/user/tools/supervisor_agent_tools.yaml
   require_human_feedback: false
   CRAG_SERVER: ""
@@ -119,39 +123,9 @@ crag:
 # Override values in specific subcharts
 tgi:
   enabled: false
-  accelDevice: "gaudi"
-  image:
-    repository: ghcr.io/huggingface/tgi-gaudi
-    tag: "2.3.1"
-  resources:
-    limits:
-      habana.ai/gaudi: 4
-  LLM_MODEL_ID: "meta-llama/Meta-Llama-3.1-70B-Instruct"
-  MAX_INPUT_LENGTH: 4096
-  MAX_TOTAL_TOKENS: 8192
-  CUDA_GRAPHS: ""
-  OMPI_MCA_btl_vader_single_copy_mechanism: none
-  PT_HPU_ENABLE_LAZY_COLLECTIVES: true
-  ENABLE_HPU_GRAPH: true
-  LIMIT_HPU_GRAPH: true
-  USE_FLASH_ATTENTION: true
-  FLASH_ATTENTION_RECOMPUTE: true
-  extraCmdArgs: ["--sharded", "true", "--num-shard", "4"]
 
 vllm:
   enabled: false
-  accelDevice: "gaudi"
-  image:
-    repository: opea/vllm-gaudi
-    tag: "latest"
-  resources:
-    limits:
-      habana.ai/gaudi: 4
-  LLM_MODEL_ID: "meta-llama/Meta-Llama-3.1-70B-Instruct"
-  OMPI_MCA_btl_vader_single_copy_mechanism: none
-  PT_HPU_ENABLE_LAZY_COLLECTIVES: true
-  VLLM_SKIP_WARMUP: true
-  extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384"]
 
 nginx:
   service:

@@ -0,0 +1,31 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# Accelerate inferencing in heaviest components to improve performance
+# by overriding their subchart values
+
+supervisor:
+  # OpenAI Compatible API without Authentication
+  llm_endpoint_url: "YourEndPoint"
+  OPENAI_API_KEY: EMPTY
+  model: "YourModel"
+  # Use OpenAI KEY
+  # llm_engine: openai
+  # OPENAI_API_KEY: YourOpenAIKey
+ragagent:
+  # OpenAI Compatible API without Authentication
+  llm_endpoint_url: "YourEndPoint"
+  OPENAI_API_KEY: EMPTY
+  model: "YourModel"
+  # Use OpenAI KEY
+  # llm_engine: openai
+  # OPENAI_API_KEY: YourOpenAIKey
+sqlagent:
+  # OpenAI Compatible API without Authentication
+  llm_endpoint_url: "YourEndPoint"
+  OPENAI_API_KEY: EMPTY
+  model: "YourModel"
+  # strategy: sql_agent
+  # Use OpenAI KEY
+  # llm_engine: openai
+  # OPENAI_API_KEY: YourOpenAIKey
@@ -37,10 +37,12 @@ curl http://localhost:9090/v1/chat/completions \
 
 For global options, see Global Options.
 
-| Key                             | Type   | Default        | Description                     |
-| ------------------------------- | ------ | -------------- | ------------------------------- |
-| global.HUGGINGFACEHUB_API_TOKEN | string | `""`           | Your own Hugging Face API token |
-| image.repository                | string | `"opea/agent"` |                                 |
-| service.port                    | string | `"9090"`       |                                 |
-| llm_endpoint_url                | string | `""`           | LLM endpoint                    |
-| global.monitoring               | bop;   | false          | Service usage metrics           |
+| Key                             | Type   | Default        | Description                                                                             |
+| ------------------------------- | ------ | -------------- | --------------------------------------------------------------------------------------- |
+| global.HUGGINGFACEHUB_API_TOKEN | string | `""`           | Your own Hugging Face API token                                                         |
+| image.repository                | string | `"opea/agent"` |                                                                                         |
+| service.port                    | string | `"9090"`       |                                                                                         |
+| llm_endpoint_url                | string | `""`           | LLM endpoint                                                                            |
+| toolHostPath                    | string | `""`           | hostPath to be mounted to agent's /home/user/tools, used for passing files for tools    |
+| toolPVC                         | string | `""`           | Same as toolHostPath, but use PVC. You can only specify one of toolHostPath and toolPVC |
+| global.monitoring               | bop;   | false          | Service usage metrics                                                                   |
@@ -6,6 +6,24 @@
 
-
+
+vllm:
+  enabled: false
+
-
+
+vllm:
+  enabled: false
+
 tgi:
   enabled: true
+  accelDevice: "gaudi"
+  image:
+    repository: ghcr.io/huggingface/tgi-gaudi
+    tag: "2.3.1"
+  resources:
+    limits:
+      habana.ai/gaudi: 4
+  LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
+  MAX_INPUT_LENGTH: 4096
+  MAX_TOTAL_TOKENS: 8192
+  CUDA_GRAPHS: ""
+  OMPI_MCA_btl_vader_single_copy_mechanism: none
+  PT_HPU_ENABLE_LAZY_COLLECTIVES: true
+  ENABLE_HPU_GRAPH: true
+  LIMIT_HPU_GRAPH: true
+  USE_FLASH_ATTENTION: true
+  FLASH_ATTENTION_RECOMPUTE: true
+  extraCmdArgs: ["--sharded", "true", "--num-shard", "4"]
   livenessProbe:
     initialDelaySeconds: 5
     periodSeconds: 5

@@ -6,6 +6,15 @@
 
-
+
+tgi:
+  enabled: false
+
-
+
+tgi:
+  enabled: false
+
 vllm:
   enabled: true
+  accelDevice: "gaudi"
   image:
     repository: opea/vllm-gaudi
+  resources:
+    limits:
+      habana.ai/gaudi: 4
+  LLM_MODEL_ID: "meta-llama/Llama-3.3-70B-Instruct"
+  OMPI_MCA_btl_vader_single_copy_mechanism: none
+  PT_HPU_ENABLE_LAZY_COLLECTIVES: true
+  VLLM_SKIP_WARMUP: true
+  extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
 llm_endpoint_url: http://{{ .Release.Name }}-vllm
@@ -58,6 +58,7 @@ data:
   recursion_limit: {{ .Values.recursion_limit | quote }}
   llm_engine: {{ .Values.llm_engine | quote }}
   strategy: {{ .Values.strategy | quote }}
+  with_memory: {{ .Values.with_memory |quote }}
   use_hints: {{ .Values.use_hints | quote }}
   max_new_tokens: {{ .Values.max_new_tokens | quote }}
   {{- if .Values.OPENAI_API_KEY }}