Skip to content

Commit

Permalink
Update readme for LLM comps and related third parties (#1234)
Browse files Browse the repository at this point in the history
* Update readme for LLM comps and related third parties

Update readmes for doc-summarization, faq-generation, text-generation, tgi and vllm

Signed-off-by: Xinyao Wang <[email protected]>
  • Loading branch information
XinyaoWa authored Jan 24, 2025
1 parent 85b732d commit f6d4601
Show file tree
Hide file tree
Showing 10 changed files with 241 additions and 434 deletions.
30 changes: 13 additions & 17 deletions comps/llms/src/doc-summarization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export LLM_MODEL_ID=${your_hf_llm_model}
export MAX_INPUT_TOKENS=2048
export MAX_TOTAL_TOKENS=4096
export DocSum_COMPONENT_NAME="OpeaDocSumTgi" # or "OpeaDocSumvLLM"
```

Please make sure MAX_TOTAL_TOKENS should be larger than (MAX_INPUT_TOKENS + max_new_tokens + 50), 50 is reserved prompt length.
Expand All @@ -26,15 +25,15 @@ Please make sure MAX_TOTAL_TOKENS should be larger than (MAX_INPUT_TOKENS + max_

Step 1: Prepare backend LLM docker image.

If you want to use vLLM backend, refer to [vLLM](../../../third_parties/vllm/src) to build vLLM docker images first.
If you want to use vLLM backend, refer to [vLLM](../../../third_parties/vllm/) to build vLLM docker images first.

No need for TGI.

Step 2: Build FaqGen docker image.
Step 2: Build DocSum docker image.

```bash
cd ../../../../
docker build -t opea/llm-docsum:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/summarization/Dockerfile .
docker build -t opea/llm-docsum:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/doc-summarization/Dockerfile .
```

### 1.3 Run Docker
Expand All @@ -49,11 +48,12 @@ You can choose one as needed.
### 1.3.1 Run Docker with CLI (Option A)

Step 1: Start the backend LLM service
Please refer to [TGI](../../../third_parties/tgi/deployment/docker_compose/) or [vLLM](../../../third_parties/vllm/deployment/docker_compose/) guideline to start a backend LLM service.
Please refer to [TGI](../../../third_parties/tgi) or [vLLM](../../../third_parties/vllm) guideline to start a backend LLM service.

Step 2: Start the DocSum microservices

```bash
export DocSum_COMPONENT_NAME="OpeaDocSumTgi" # or "OpeaDocSumvLLM"
docker run -d \
--name="llm-docsum-server" \
-p 9000:9000 \
Expand All @@ -71,20 +71,16 @@ docker run -d \

### 1.3.2 Run Docker with Docker Compose (Option B)

```bash
cd ../../deployment/docker_compose/

# Backend is TGI on xeon
docker compose -f doc-summarization_tgi.yaml up -d

# Backend is TGI on gaudi
# docker compose -f doc-summarization_tgi_on_intel_hpu.yaml up -d
Set `service_name` to match backend service.

# Backend is vLLM on xeon
# docker compose -f doc-summarization_vllm.yaml up -d
```bash
export service_name="docsum-tgi"
# export service_name="docsum-tgi-gaudi"
# export service_name="docsum-vllm"
# export service_name="docsum-vllm-gaudi"

# Backend is vLLM on gaudi
# docker compose -f doc-summarization_vllm_on_intel_hpu.yaml up -d
cd ../../deployment/docker_compose/
docker compose -f compose_doc-summarization.yaml up ${service_name} -d
```

## 🚀3. Consume LLM Service
Expand Down
26 changes: 11 additions & 15 deletions comps/llms/src/faq-generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,13 @@ export FAQ_PORT=9000
export HF_TOKEN=${your_hf_api_token}
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export LLM_MODEL_ID=${your_hf_llm_model}
export FAQGen_COMPONENT_NAME="OpeaFaqGenTgi" # or "vllm"
```

### 1.2 Build Docker Image

Step 1: Prepare backend LLM docker image.

If you want to use vLLM backend, refer to [vLLM](../../../third_parties/vllm/src) to build vLLM docker images first.
If you want to use vLLM backend, refer to [vLLM](../../../third_parties/vllm) to build vLLM docker images first.

No need for TGI.

Expand All @@ -45,11 +44,12 @@ You can choose one as needed.
#### 1.3.1 Run Docker with CLI (Option A)

Step 1: Start the backend LLM service
Please refer to [TGI](../../../third_parties/tgi/deployment/docker_compose/) or [vLLM](../../../third_parties/vllm/deployment/docker_compose/) guideline to start a backend LLM service.
Please refer to [TGI](../../../third_parties/tgi) or [vLLM](../../../third_parties/vllm) guideline to start a backend LLM service.

Step 2: Start the FaqGen microservices

```bash
export FAQGen_COMPONENT_NAME="OpeaFaqGenTgi" # or "OpeaFaqGenvLLM"
docker run -d \
--name="llm-faqgen-server" \
-p 9000:9000 \
Expand All @@ -65,20 +65,16 @@ docker run -d \

#### 1.3.2 Run Docker with Docker Compose (Option B)

```bash
cd ../../deployment/docker_compose/

# Backend is TGI on xeon
docker compose -f faq-generation_tgi.yaml up -d
Set `service_name` to match backend service.

# Backend is TGI on gaudi
# docker compose -f faq-generation_tgi_on_intel_hpu.yaml up -d

# Backend is vLLM on xeon
# docker compose -f faq-generation_vllm.yaml up -d
```bash
export service_name="faqgen-tgi"
# export service_name="faqgen-tgi-gaudi"
# export service_name="faqgen-vllm"
# export service_name="faqgen-vllm-gaudi"

# Backend is vLLM on gaudi
# docker compose -f faq-generation_vllm_on_intel_hpu.yaml up -d
cd ../../deployment/docker_compose/
docker compose -f compose_faq-generation.yaml up ${service_name} -d
```

## 🚀2. Consume LLM Service
Expand Down
155 changes: 108 additions & 47 deletions comps/llms/src/text-generation/README.md
Original file line number Diff line number Diff line change
@@ -1,113 +1,174 @@
# TGI LLM Microservice
# LLM text generation Microservice

[Text Generation Inference](https://github.com/huggingface/text-generation-inference) (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and more.
This microservice, designed for Language Model Inference (LLM), processes input consisting of a query string and associated reranked documents. It constructs a prompt based on the query and documents, which is then used to perform inference with a large language model. The service delivers the inference results as output.

## 🚀1. Start Microservice with Python (Option 1)
A prerequisite for using this microservice is that users must have a LLM text generation service (etc., TGI, vLLM) already running. Users need to set the LLM service's endpoint into an environment variable. The microservice utilizes this endpoint to create an LLM object, enabling it to communicate with the LLM service for executing language model operations.

To start the LLM microservice, you need to install python packages first.
Overall, this microservice offers a streamlined way to integrate large language model inference into applications, requiring minimal setup from the user beyond initiating a TGI/vLLM service and configuring the necessary environment variables. This allows for the seamless processing of queries and documents to generate intelligent, context-aware responses.

### 1.1 Install Requirements
## Validated LLM Models

```bash
pip install -r requirements.txt
```
| Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi |
| --------------------------- | --------- | -------- | ---------- |
| [Intel/neural-chat-7b-v3-3] ||||
| [Llama-2-7b-chat-hf] ||||
| [Llama-2-70b-chat-hf] || - ||
| [Meta-Llama-3-8B-Instruct] ||||
| [Meta-Llama-3-70B-Instruct] || - ||
| [Phi-3] | x | Limit 4K | Limit 4K |

## Support integrations

### 1.2 Start 3rd-party TGI Service
In this microservices, we have supported following backend LLM service as integrations, we will include TGI/vLLM/Ollama in this readme, for others, please refer to corresponding readmes.

Please refer to [3rd-party TGI](../../../third_parties/tgi/deployment/docker_compose/) to start a LLM endpoint and verify.
- TGI
- VLLM
- Ollama
- [Bedrock](./README_bedrock.md)
- [Native](./README_native.md), based on optimum habana
- [Predictionguard](./README_predictionguard.md)

### 1.3 Start LLM Service with Python Script
## Clone OPEA GenAIComps

Clone this repository at your desired location and set an environment variable for easy setup and usage throughout the instructions.

```bash
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
python llm.py
git clone https://github.com/opea-project/GenAIComps.git

export OPEA_GENAICOMPS_ROOT=$(pwd)/GenAIComps
```

## 🚀2. Start Microservice with Docker (Option 2)
## Prerequisites

If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a TGI/vLLM service with docker.
For TGI/vLLM, You must create a user account with [HuggingFace] and obtain permission to use the gated LLM models by adhering to the guidelines provided on the respective model's webpage. The environment variables `LLM_MODEL` would be the HuggingFace model id and the `HF_TOKEN` is your HuggugFace account's "User Access Token".

### 2.1 Setup Environment Variables
## 🚀Start Microservice with Docker

In order to start TGI and LLM services, you need to setup the following environment variables first.
In order to start the microservices with docker, you need to build the docker images first for the microservice.

```bash
export HF_TOKEN=${your_hf_api_token}
export TGI_LLM_ENDPOINT="http://${your_ip}:8008"
export LLM_MODEL_ID=${your_hf_llm_model}
```
### 1. Build Docker Image

#### 1.1 Prepare backend LLM docker image.

If you want to use vLLM backend, refer to [vLLM](../../../third_parties/vllm/) to build vLLM docker images first.

### 2.2 Build Docker Image
No need for TGI or Ollama.

#### 1.2 Prepare TextGen docker image.

```bash
cd ../../../../
docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
# Build the microservice docker
cd ${OPEA_GENAICOMPS_ROOT}

docker build \
--build-arg https_proxy=$https_proxy \
--build-arg http_proxy=$http_proxy \
-t opea/llm-textgen:latest \
-f comps/llms/src/text-generation/Dockerfile .
```

### 2. Start LLM Service with the built image

To start a docker container, you have two options:

- A. Run Docker with CLI
- B. Run Docker with Docker Compose

You can choose one as needed.
You can choose one as needed. If you start an LLM microservice with docker compose, the `compose_text-generation.yaml` file will automatically start both endpoint and the microservice docker.

### 2.3 Run Docker with CLI (Option A)
#### 2.1 Setup Environment Variables

In order to start services, you need to setup the following environment variables first.

```bash
export LLM_ENDPOINT_PORT=8008
export TEXTGEN_PORT=9000
export host_ip=${host_ip}
export HF_TOKEN=${HF_TOKEN}
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
```

#### 2.2 Run Docker with CLI (Option A)

Step 1: Start the backend LLM service

Please refer to [TGI](../../../third_parties/tgi/), [vLLM](../../../third_parties/vllm/), [Ollama](../../../third_parties/ollama/) guideline to start a backend LLM service.

Step 2: Start the TextGen microservices

```bash
docker run -d --name="llm-tgi-server" -p 9000:9000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TGI_LLM_ENDPOINT=$TGI_LLM_ENDPOINT -e HF_TOKEN=$HF_TOKEN opea/llm-textgen:latest
export LLM_COMPONENT_NAME="OpeaTextGenService"
docker run \
--name="llm-textgen-server" \
-p $TEXTGEN_PORT:9000 \
--ipc=host \
-e http_proxy=$http_proxy \
-e https_proxy=$https_proxy \
-e no_proxy=${no_proxy} \
-e LLM_ENDPOINT=$LLM_ENDPOINT \
-e HF_TOKEN=$HF_TOKEN \
-e LLM_MODEL_ID=$LLM_MODEL_ID \
-e LLM_COMPONENT_NAME=$LLM_COMPONENT_NAME \
opea/llm-textgen:latest
```

### 2.4 Run Docker with Docker Compose (Option B)
#### 2.3 Run Docker with Docker Compose (Option B)

Set `service_name` to match backend service.

```bash
cd comps/llms/deployment/docker_compose/
docker compose -f text-generation_tgi.yaml up -d
export service_name="textgen-service-tgi"
# export service_name="textgen-service-tgi-gaudi"
# export service_name="textgen-service-vllm"
# export service_name="textgen-service-vllm-gaudi"
# export service_name="textgen-service-ollama"

cd ../../deployment/docker_compose/
docker compose -f compose_text-generation.yaml up ${service_name} -d
```

## 🚀3. Consume LLM Service

### 3.1 Check Service Status

```bash
curl http://${your_ip}:9000/v1/health_check\
curl http://${host_ip}:${TEXTGEN_PORT}/v1/health_check\
-X GET \
-H 'Content-Type: application/json'
```

### 3.2 Consume LLM Service
### 3.1 Verify microservice

You can set the following model parameters according to your actual needs, such as `max_tokens`, `stream`.

The `stream` parameter determines the format of the data returned by the API. It will return text string with `stream=false`, return text stream flow with `stream=true`.

```bash
# stream mode
curl http://${your_ip}:9000/v1/chat/completions \
curl http://${host_ip}:${TEXTGEN_PORT}/v1/chat/completions \
-X POST \
-d '{"model": "${LLM_MODEL_ID}", "messages": "What is Deep Learning?", "max_tokens":17}' \
-H 'Content-Type: application/json'

curl http://${your_ip}:9000/v1/chat/completions \
curl http://${host_ip}:${TEXTGEN_PORT}/v1/chat/completions \
-X POST \
-d '{"model": "${LLM_MODEL_ID}", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
-H 'Content-Type: application/json'

#Non-stream mode
curl http://${your_ip}:9000/v1/chat/completions \
curl http://${host_ip}:${TEXTGEN_PORT}/v1/chat/completions \
-X POST \
-d '{"model": "${LLM_MODEL_ID}", "messages": "What is Deep Learning?", "max_tokens":17, "stream":false}' \
-H 'Content-Type: application/json'
```

For parameters in Chat mode, please refer to [OpenAI API](https://platform.openai.com/docs/api-reference/chat/create)

### 4. Validated Model
<!--Below are links used in these document. They are not rendered: -->

| Model | TGI |
| ------------------------- | --- |
| Intel/neural-chat-7b-v3-3 ||
| Llama-2-7b-chat-hf ||
| Llama-2-70b-chat-hf ||
| Meta-Llama-3-8B-Instruct ||
| Meta-Llama-3-70B-Instruct ||
| Phi-3 ||
[Intel/neural-chat-7b-v3-3]: https://huggingface.co/Intel/neural-chat-7b-v3-3
[Llama-2-7b-chat-hf]: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
[Llama-2-70b-chat-hf]: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
[Meta-Llama-3-8B-Instruct]: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
[Meta-Llama-3-70B-Instruct]: https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
[Phi-3]: https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3
[HuggingFace]: https://huggingface.co/
16 changes: 9 additions & 7 deletions comps/llms/src/text-generation/README_native.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,24 @@ LLM Native microservice uses [optimum-habana](https://github.com/huggingface/opt

## 🚀1. Start Microservice

If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a Native LLM service with docker.

### 1.1 Setup Environment Variables

In order to start Native LLM service, you need to setup the following environment variables first.

For LLM model, both `Qwen` and `Falcon3` models are supported. Users can set different models by changing the `LLM_NATIVE_MODEL` below.
For LLM model, both `Qwen` and `Falcon3` models are supported. Users can set different models by changing the `LLM_MODEL_ID` below.

```bash
export LLM_NATIVE_MODEL="Qwen/Qwen2-7B-Instruct"
export LLM_MODEL_ID="Qwen/Qwen2-7B-Instruct"
export HF_TOKEN="your_huggingface_token"
export TEXTGEN_PORT=10512
export host_ip=${host_ip}
```

### 1.2 Build Docker Image

```bash
cd ../../../../../
docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
docker build -t opea/llm-textgen-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile.intel_hpu .
```

To start a docker container, you have two options:
Expand All @@ -34,13 +34,15 @@ You can choose one as needed.
### 1.3 Run Docker with CLI (Option A)

```bash
docker run -d --runtime=habana --name="llm-native-server" -p 9000:9000 -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e TOKENIZERS_PARALLELISM=false -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e LLM_NATIVE_MODEL=${LLM_NATIVE_MODEL} opea/llm-native:latest
docker run -d --runtime=habana --name="llm-native-server" -p 9000:9000 -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e TOKENIZERS_PARALLELISM=false -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e LLM_MODEL_ID=${LLM_MODEL_ID} opea/llm-textgen-gaudi:latest
```

### 1.4 Run Docker with Docker Compose (Option B)

```bash
docker compose -f docker_compose_llm.yaml up -d
export service_name="textgen-native-gaudi"
cd comps/llms/deployment/docker_compose
docker compose -f compose_text-generation.yaml up ${service_name} -d
```

## 🚀2. Consume LLM Service
Expand Down
Loading

0 comments on commit f6d4601

Please sign in to comment.