This document outlines the deployment process for a MultimodalQnA application utilizing the GenAIComps microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as multimodal_embedding
that employs BridgeTower model as embedding model, multimodal_retriever
, lvm
, and multimodal-data-prep
. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service.
Since the compose.yaml
will consume some environment variables, you need to setup them in advance as below.
Export the value of the public IP address of your Gaudi server to the host_ip
environment variable
Change the External_Public_IP below with the actual IPV4 value
export host_ip="External_Public_IP"
Append the value of the public IP address to the no_proxy list
export your_no_proxy=${your_no_proxy},"External_Public_IP"
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
export REDIS_HOST=${host_ip}
export INDEX_NAME="mm-rag-redis"
export WHISPER_PORT=7066
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export MAX_IMAGES=1
export WHISPER_MODEL="base"
export DATAPREP_MMR_PORT=6007
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/ingest"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/generate_transcripts"
export DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/generate_captions"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/get"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/dataprep/delete"
export EMM_BRIDGETOWER_PORT=6006
export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc"
export BRIDGE_TOWER_EMBEDDING=true
export MMEI_EMBEDDING_ENDPOINT="http://${host_ip}:$EMM_BRIDGETOWER_PORT"
export MM_EMBEDDING_PORT_MICROSERVICE=6000
export REDIS_RETRIEVER_PORT=7000
export LVM_PORT=9399
export LLAVA_SERVER_PORT=8399
export TGI_GAUDI_PORT="${LLAVA_SERVER_PORT}:80"
export LVM_MODEL_ID="llava-hf/llava-v1.6-vicuna-13b-hf"
export LVM_ENDPOINT="http://${host_ip}:${LLAVA_SERVER_PORT}"
export MEGA_SERVICE_PORT=8888
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna"
export UI_PORT=5173
Note: Please replace with host_ip
with you external IP address, do not use localhost.
Note: The
MAX_IMAGES
environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server. If an image list longer thanMAX_IMAGES
is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not been trained with multiple images and may lead to inaccurate results. IfMAX_IMAGES
is not set, it will default to1
.
First of all, you need to build Docker Images locally and install the python package of it.
Build embedding-multimodal-bridgetower docker image
git clone https://github.com/opea-project/GenAIComps.git
cd GenAIComps
docker build --no-cache -t opea/embedding-multimodal-bridgetower:latest --build-arg EMBEDDER_PORT=$EMM_BRIDGETOWER_PORT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/bridgetower/src/Dockerfile .
Build embedding microservice image
docker build --no-cache -t opea/embedding:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/embeddings/src/Dockerfile .
docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile .
Build TGI Gaudi image
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
Build lvm microservice image
docker build --no-cache -t opea/lvm:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/lvms/src/Dockerfile .
docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
Build whisper server image
docker build --no-cache -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/src/integrations/dependency/whisper/Dockerfile .
To construct the Mega Service, we utilize the GenAIComps microservice pipeline within the multimodalqna.py Python script. Build MegaService Docker image via below command:
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/MultimodalQnA
docker build --no-cache -t opea/multimodalqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
Build frontend Docker image via below command:
cd GenAIExamples/MultimodalQnA/ui/
docker build --no-cache -t opea/multimodalqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
Then run the command docker images
, you will have the following 11 Docker Images:
opea/dataprep:latest
opea/lvm:latest
ghcr.io/huggingface/tgi-gaudi:2.0.6
opea/retriever:latest
opea/whisper:latest
opea/redis-vector-db
opea/embedding:latest
opea/embedding-multimodal-bridgetower:latest
opea/multimodalqna:latest
opea/multimodalqna-ui:latest
By default, the multimodal-embedding and LVM models are set to a default value as listed below:
Service | Model |
---|---|
embedding | BridgeTower/bridgetower-large-itm-mlm-gaudi |
LVM | llava-hf/llava-v1.6-vicuna-13b-hf |
Before running the docker compose command, you need to be in the folder that has the docker compose yaml file
cd GenAIExamples/MultimodalQnA/docker_compose/intel/hpu/gaudi/
docker compose -f compose.yaml up -d
- embedding-multimodal-bridgetower
curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
-X POST \
-H "Content-Type:application/json" \
-d '{"text":"This is example"}'
curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \
-X POST \
-H "Content-Type:application/json" \
-d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}'
- embedding
curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
-X POST \
-H "Content-Type: application/json" \
-d '{"text" : "This is some sample text."}'
curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \
-X POST \
-H "Content-Type: application/json" \
-d '{"text": {"text" : "This is some sample text."}, "image" : {"url": "https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true"}}'
- retriever-multimodal-redis
export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)")
curl http://${host_ip}:7000/v1/multimodal_retrieval \
-X POST \
-H "Content-Type: application/json" \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
- whisper
curl ${WHISPER_SERVER_ENDPOINT} \
-X POST \
-H "Content-Type: application/json" \
-d '{"audio" : "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}'
- TGI LLaVA Gaudi Server
curl http://${host_ip}:${LLAVA_SERVER_PORT}/generate \
-X POST \
-d '{"inputs":"![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)What is this a picture of?\n\n","parameters":{"max_new_tokens":16, "seed": 42}}' \
-H 'Content-Type: application/json'
- lvm
curl http://${host_ip}:${LVM_PORT}/v1/lvm \
-X POST \
-H 'Content-Type: application/json' \
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [{"b64_img_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "transcript_for_inference": "yellow image", "video_id": "8c7461df-b373-4a00-8696-9a2234359fe0", "time_of_frame_ms":"37000000", "source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4"}], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
curl http://${host_ip}:${LVM_PORT}/v1/lvm \
-X POST \
-H 'Content-Type: application/json' \
-d '{"image": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC", "prompt":"What is this?"}'
Also, validate LVM TGI Gaudi Server with empty retrieval results
curl http://${host_ip}:${LVM_PORT}/v1/lvm \
-X POST \
-H 'Content-Type: application/json' \
-d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}'
- Multimodal Dataprep Microservice
Download a sample video, image, PDF, and audio file and create a caption
export video_fn="WeAreGoingOnBullrun.mp4"
wget http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/WeAreGoingOnBullrun.mp4 -O ${video_fn}
export image_fn="apple.png"
wget https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true -O ${image_fn}
export pdf_fn="nke-10k-2023.pdf"
wget https://raw.githubusercontent.com/opea-project/GenAIComps/v1.1/comps/retrievers/redis/data/nke-10k-2023.pdf -O ${pdf_fn}
export caption_fn="apple.txt"
echo "This is an apple." > ${caption_fn}
export audio_fn="AudioSample.wav"
wget https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav -O ${audio_fn}
Test dataprep microservice with generating transcript. This command updates a knowledge base by uploading a local video .mp4 and an audio .wav file.
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST \
-F "files=@./${video_fn}" \
-F "files=@./${audio_fn}"
Also, test dataprep microservice with generating an image caption using lvm
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_GEN_CAPTION_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${image_fn}"
Now, test the microservice with posting a custom caption along with an image and a PDF containing images and text.
curl --silent --write-out "HTTPSTATUS:%{http_code}" \
${DATAPREP_INGEST_SERVICE_ENDPOINT} \
-H 'Content-Type: multipart/form-data' \
-X POST -F "files=@./${image_fn}" -F "files=@./${caption_fn}" \
-F "files=@./${pdf_fn}"
Also, you are able to get the list of all files that you uploaded:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"file_path": "all"}' \
${DATAPREP_GET_FILE_ENDPOINT}
Then you will get the response python-style LIST like this. Notice the name of each uploaded file e.g., videoname.mp4
will become videoname_uuid.mp4
where uuid
is a unique ID for each uploaded file. The same files that are uploaded twice will have different uuid
.
[
"WeAreGoingOnBullrun_7ac553a1-116c-40a2-9fc5-deccbb89b507.mp4",
"WeAreGoingOnBullrun_6d13cf26-8ba2-4026-a3a9-ab2e5eb73a29.mp4",
"apple_fcade6e6-11a5-44a2-833a-3e534cbe4419.png",
"nke-10k-2023_28000757-5533-4b1b-89fe-7c0a1b7e2cd0.pdf",
"AudioSample_976a85a6-dc3e-43ab-966c-9d81beef780c.wav"
]
To delete all uploaded files along with data indexed with $INDEX_NAME
in REDIS.
curl -X POST \
-H "Content-Type: application/json" \
${DATAPREP_DELETE_FILE_ENDPOINT}
- MegaService
Test the MegaService with a text query:
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-X POST \
-d '{"messages": "What is the revenue of Nike in 2023?"}'
Test the MegaService with an audio query:
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
Test the MegaService with a text and image query:
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "Green bananas in a tree"}, {"type": "image_url", "image_url": {"url": "http://images.cocodataset.org/test-stuff2017/000000004248.jpg"}}]}]}'
Test the MegaService with a back and forth conversation between the user and assistant:
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10}'