Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchQnA App - Adding files to deploy SearchQnA application on AMD GPU #1193

Open
wants to merge 70 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
a2c3be2
Updated the Pinecone readme to reflect the new structure (#1222)
pallavijaini0525 Dec 5, 2024
480c258
Update audioQnA compose (#1227)
WenjiaoYue Dec 5, 2024
08938a7
move examples gateway (#992)
lkk12014402 Dec 6, 2024
fc335fb
Update tests for issue 1229 (#1231)
MSCetin37 Dec 7, 2024
0eb6367
Added compose example for VisualQnA deployment on AMD ROCm systems (#…
artem-astafev Dec 7, 2024
1e61593
[ChatQNA] Fixes Embedding Endpoint (#1230)
theBeginner86 Dec 9, 2024
423430c
Remove deprecated docker compose files (#1238)
lianhao Dec 10, 2024
50062eb
[ChatQnA] Remove enforce-eager to enable HPU graphs for better vLLM p…
wangkl2 Dec 10, 2024
013a6bf
Changed Default UI to Gradio (#1246)
okhleif-IL Dec 11, 2024
63b86b8
[DocIndexRetriever] enable the without-rerank flavor (#1223)
gavinlichn Dec 12, 2024
3550547
Adds audio querying to MultimodalQ&A Example (#1225)
mhbuehler Dec 12, 2024
2c5432f
remove examples gateway. (#1243)
lkk12014402 Dec 13, 2024
f65b4bb
remove examples gateway. (#1250)
lkk12014402 Dec 14, 2024
956139b
Change to pull_request_target for dependency review workflow (#1256)
XuehaoSun Dec 17, 2024
636b404
Update Multimodal Docker File Path (#1252)
letonghan Dec 17, 2024
379ff0b
Added docker compose example for AgentQnA deployment on AMD ROCm (#1…
artem-astafev Dec 18, 2024
6e037f7
DocSum - Solving the problem of running DocSum on ROCm (#1268)
chyundunovDatamonsters Dec 18, 2024
158c9d0
Added compose example for MultimodalQnA deployment on AMD ROCm system…
artem-astafev Dec 18, 2024
c9f8d00
Update CODEOWNERS list for PR review (#1262)
chensuyue Dec 19, 2024
0f12be9
Adding URL summary option to DocSum Gradio-UI (#1248)
MSCetin37 Dec 19, 2024
b769bbe
Chatqna/benchmark: Remove the deprecated directory (#1261)
bjzhjing Dec 19, 2024
62763f2
Minor fix DocIndexRetriever test (#1266)
Spycsh Dec 19, 2024
2020c37
Align DocIndexRetriever Xeon tests with Gaudi (#1272)
Spycsh Dec 20, 2024
704af4e
FaqGen param fix (#1277)
XinyaoWa Dec 20, 2024
9f61ed0
SearchQnA - add Docker compose file and set envs script for deploy Se…
Nov 26, 2024
08db335
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
83d9616
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
0e1bb95
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
d1bdeaf
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
34286f7
SearchQnA - add tests script for Translation App
Nov 27, 2024
c01a933
SearchQnA - fix tests script for Translation App
Nov 27, 2024
c07ec3e
SearchQnA - add README.md file
Nov 27, 2024
c244123
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 27, 2024
fcdabfb
SearchQnA - fix PR problems
Dec 18, 2024
5ab7f33
SearchQnA - fix PR problems
Dec 18, 2024
216006d
SearchQnA - fix PR problems
Dec 18, 2024
72e30ba
SearchQnA - fix deploy on AMD
Dec 20, 2024
64c341a
SearchQnA - add Docker compose file and set envs script for deploy Se…
Nov 26, 2024
2dc67e8
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
b8be21c
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
48956e9
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
955b414
SearchQnA - fix Docker compose file and set envs script for deploy Se…
Nov 26, 2024
6164cea
SearchQnA - fix deploy on AMD
Dec 20, 2024
d3f732f
SearchQnA - fix tests script for Translation App
Nov 27, 2024
858c5b8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 27, 2024
02239c4
SearchQnA - fix PR problems
Dec 18, 2024
730e236
SearchQnA - fix PR problems
Dec 18, 2024
a8966e5
SearchQnA - fix PR problems
Dec 18, 2024
3495ecc
SearchQnA - fix deploy on AMD
Dec 20, 2024
2dc233e
SearchQnA - fix deploy on AMD
Dec 20, 2024
b19c5a9
SearchQnA - fix deploy on AMD
Dec 20, 2024
7d04fbc
Merge branch 'main' into feature/GenAIExample_SearchQnA_deploy_on_AMD
chyundunovDatamonsters Dec 23, 2024
04159d3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 23, 2024
82a23b1
Merge branch 'main' into feature/GenAIExample_SearchQnA_deploy_on_AMD
xiguiw Dec 26, 2024
78b5ef1
Merge branch 'main' into feature/GenAIExample_SearchQnA_deploy_on_AMD
chyundunovDatamonsters Jan 9, 2025
9550d83
SearchQnA - fix README
Jan 15, 2025
085f389
Merge remote-tracking branch 'origin/feature/GenAIExample_SearchQnA_d…
Jan 15, 2025
4dc17e2
Merge remote-tracking branch 'opea-origin/main' into feature/GenAIExa…
Jan 15, 2025
0d51555
SearchQnA - fix README
Jan 15, 2025
1e8f5e5
Merge branch 'main' into feature/GenAIExample_SearchQnA_deploy_on_AMD
chyundunovDatamonsters Jan 16, 2025
3380006
Merge remote-tracking branch 'opea-origin/main' into feature/GenAIExa…
Jan 16, 2025
a9234d1
Merge remote-tracking branch 'origin/feature/GenAIExample_SearchQnA_d…
Jan 16, 2025
82d2595
SearchQnA - fix PR
Jan 16, 2025
3d7efa0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 16, 2025
1fe85fa
SearchQnA - fix compose file and tests scripts
Jan 16, 2025
20299b2
Merge remote-tracking branch 'origin/feature/GenAIExample_SearchQnA_d…
Jan 16, 2025
5af8532
ChatQnA - fix deploy on AMD
Jan 17, 2025
b537e02
ChatQnA - fix deploy on AMD
Jan 17, 2025
dfe8b3d
ChatQnA - fix deploy on AMD
Jan 17, 2025
82ec725
ChatQnA - fix deploy on AMD
Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions DocSum/docker_compose/amd/gpu/rocm/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ services:
security_opt:
- seccomp:unconfined
ipc: host
command: --model-id ${DOCSUM_LLM_MODEL_ID} --max-input-length ${MAX_INPUT_TOKENS} --max-total-tokens ${MAX_TOTAL_TOKENS}

command: --model-id ${DOCSUM_LLM_MODEL_ID}
docsum-llm-server:
image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest}
container_name: docsum-llm-server
Expand Down Expand Up @@ -70,6 +69,18 @@ services:
https_proxy: ${https_proxy}
restart: unless-stopped

whisper:
chyundunovDatamonsters marked this conversation as resolved.
Show resolved Hide resolved
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
container_name: whisper-service
ports:
- "7066:7066"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
restart: unless-stopped

dataprep-audio2text:
image: ${REGISTRY:-opea}/dataprep-audio2text:${TAG:-latest}
container_name: dataprep-audio2text-service
Expand Down
4 changes: 4 additions & 0 deletions DocSum/docsum.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,11 @@ async def handle_request(self, request: Request, files: List[UploadFile] = File(
if "application/json" in request.headers.get("content-type"):
data = await request.json()
stream_opt = data.get("stream", True)

summary_type = data.get("summary_type", "auto")
chunk_size = data.get("chunk_size", -1)
chunk_overlap = data.get("chunk_overlap", -1)

chat_request = ChatCompletionRequest.model_validate(data)
prompt = handle_message(chat_request.messages)

Expand All @@ -119,9 +121,11 @@ async def handle_request(self, request: Request, files: List[UploadFile] = File(
elif "multipart/form-data" in request.headers.get("content-type"):
data = await request.form()
stream_opt = data.get("stream", True)

summary_type = data.get("summary_type", "auto")
chunk_size = data.get("chunk_size", -1)
chunk_overlap = data.get("chunk_overlap", -1)

chat_request = ChatCompletionRequest.model_validate(data)

data_type = data.get("type")
Expand Down
3 changes: 3 additions & 0 deletions DocSum/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -210,14 +210,17 @@ function validate_microservices() {
"${host_ip}:7079/v1/multimedia2text" \
"well" \
"dataprep-multimedia2text" \

"dataprep-multimedia2text" \
"{\"video\": \"$(input_data_for_test "video")\"}"

# Docsum Data service - audio
validate_services_json \
"${host_ip}:7079/v1/multimedia2text" \
"well" \

"dataprep-multimedia2text" \

"dataprep-multimedia2text" \
"{\"audio\": \"$(input_data_for_test "audio")\"}"

Expand Down
179 changes: 179 additions & 0 deletions SearchQnA/docker_compose/amd/gpu/rocm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Build and deploy SearchQnA Application on AMD GPU (ROCm)

## Build images

### Build Embedding Image

```bash
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build --no-cache -t opea/embedding-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/tei/langchain/Dockerfile .
```

### Build Retriever Image

```bash
docker build --no-cache -t opea/web-retriever-chroma:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/web_retrievers/chroma/langchain/Dockerfile .
```

### Build Rerank Image

```bash
docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile .
```

### Build the LLM Docker Image

```bash
docker build -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
```

### Build the MegaService Docker Image

```bash
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/SearchQnA
docker build --no-cache -t opea/searchqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
```

### Build the UI Docker Image

```bash
cd GenAIExamples/SearchQnA/ui
docker build --no-cache -t opea/opea/searchqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```

## Deploy SearchQnA Application

### Features of Docker compose for AMD GPUs

1. Added forwarding of GPU devices to the container TGI service with instructions:

```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/:/dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```

In this case, all GPUs are thrown. To reset a specific GPU, you need to use specific device names cardN and renderN.

For example:

```yaml
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/render128:/dev/dri/render128
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
```

To find out which GPU device IDs cardN and renderN correspond to the same GPU, use the GPU driver utility

### Go to the directory with the Docker compose file

```bash
cd GenAIExamples/SearchQnA/docker_compose/amd/gpu/rocm
```

### Set environments

In the file "GenAIExamples/SearchQnA/docker_compose/amd/gpu/rocm/set_env.sh " it is necessary to set the required values. Parameter assignments are specified in the comments for each variable setting command

```bash
chmod +x set_env.sh
. set_env.sh
```

### Run services

```
docker compose up -d
```

# Validate the MicroServices and MegaService

## Validate TEI service

```bash
curl http://${SEARCH_HOST_IP}:3001/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
```

## Validate Embedding service

```bash
curl http://${SEARCH_HOST_IP}:3002/v1/embeddings\
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
```

## Validate Web Retriever service

```bash
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${SEARCH_HOST_IP}:3003/v1/web_retrieval \
-X POST \
-d "{\"text\":\"What is the 2024 holiday schedule?\",\"embedding\":${your_embedding}}" \
-H 'Content-Type: application/json'
```

## Validate TEI Reranking service

```bash
curl http://${SEARCH_HOST_IP}:3004/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```

## Validate Reranking service

```bash
curl http://${SEARCH_HOST_IP}:3005/v1/reranking\
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
```

## Validate TGI service

```bash
curl http://${SEARCH_HOST_IP}:3006/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
```

## Validate LLM service

```bash
curl http://${SEARCH_HOST_IP}:3007/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

## Validate MegaService

```bash
curl http://${SEARCH_HOST_IP}:3008/v1/searchqna -H "Content-Type: application/json" -d '{
"messages": "What is the latest news? Give me also the source link.",
"stream": "True"
}'
```
Loading
Loading