Skip to content

Commit

Permalink
Document Sync by Tina
Browse files Browse the repository at this point in the history
  • Loading branch information
Chivier committed Nov 17, 2024
1 parent 8534e4b commit 51bb18e
Showing 1 changed file with 46 additions and 121 deletions.
167 changes: 46 additions & 121 deletions docs/stable/serve/storage_aware_scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,135 +2,67 @@
sidebar_position: 0
---

# Storage Aware Scheduling
# Storage Aware Scheduling with Docker Compose

## Pre-requisites
To enable storage aware model loading scheduling, a hardware configuration file is required.
For example, the following is a sample configuration file for two servers:
```bash
echo '{
"0": {
"host_size": "32GB",
"host_bandwidth": "24GB/s",
"disk_size": "128GB",
"disk_bandwidth": "5GB/s",
"network_bandwidth": "10Gbps"
},
"1": {
"host_size": "32GB",
"host_bandwidth": "24GB/s",
"disk_size": "128GB",
"disk_bandwidth": "5GB/s",
"network_bandwidth": "10Gbps"
}
}' > hardware_config.json
```

We will use Docker to run a ServerlessLLM cluster in this example. Therefore, please make sure you have read the [Docker Quickstart Guide](../getting_started/docker_quickstart.md) before proceeding.
We will use Docker Compose to run a ServerlessLLM cluster in this example. Therefore, please make sure you have read the [Docker Quickstart Guide](../getting_started/docker_quickstart.md) before proceeding.

## Usage
Start a local Docker-based ray cluster.

### Step 1: Start Ray Head Node and Worker Nodes
Start a local Docker-based ray cluster using Docker Compose.

1. Start the Ray head node.
### Step 1: Clone the ServerlessLLM Repository

If you haven't already, clone the ServerlessLLM repository:

```bash
docker run -d --name ray_head \
--runtime nvidia \
--network sllm \
-p 6379:6379 \
-p 8343:8343 \
--gpus '"device=none"' \
serverlessllm/sllm-serve
git clone https://github.com/ServerlessLLM/ServerlessLLM.git
cd ServerlessLLM/examples/storage_aware_scheduling
```

2. Start the Ray worker nodes.
### Step 2: Configuration

Ensure that you have a directory for storing your models and set the `MODEL_FOLDER` environment variable to this directory:
Set the Model Directory. Create a directory on your host machine where models will be stored and set the `MODEL_FOLDER` environment variable to point to this directory:

```bash
export MODEL_FOLDER=path/to/models
export MODEL_FOLDER=/path/to/your/models
```

```bash
docker run -d --name ray_worker_0 \
--runtime nvidia \
--network sllm \
--gpus '"device=0"' \
--env WORKER_ID=0 \
--mount type=bind,source=$MODEL_FOLDER,target=/models \
serverlessllm/sllm-serve-worker

docker run -d --name ray_worker_1 \
--runtime nvidia \
--network sllm \
--gpus '"device=1"' \
--env WORKER_ID=1 \
--mount type=bind,source=$MODEL_FOLDER,target=/models \
serverlessllm/sllm-serve-worker
```
Replace `/path/to/your/models` with the actual path where you want to store the models.

### Step 2: Start ServerlessLLM Serve with Storage Aware Scheduler
### Step 3: Enable Storage Aware Scheduling in Docker Compose

1. Copy the hardware configuration file to the Ray head node.
The Docker Compose configuration is already located in the `examples/storage_aware_scheduling` directory. To activate storage-aware scheduling, ensure the `docker-compose.yml` file includes the necessary configurations(`sllm_head` service should include the `--enable_storage_aware` command).

:::tip
Recommend to adjust the number of GPUs and `mem_pool_size` based on the resources available on your machine.
:::

```bash
docker cp hardware_config.json ray_head:/app/hardware_config.json
```

2. Start the ServerlessLLM serve with the storage aware scheduler.
### Step 4: Start the Services

Start the ServerlessLLM services using Docker Compose:

```bash
docker exec ray_head sh -c "/opt/conda/bin/sllm-serve start --hardware-config /app/hardware_config.json"
docker compose up -d --build
```

### Step 3: Deploy Models with Placement Spec
This command will start the Ray head node and two worker nodes defined in the `docker-compose.yml` file.

:::tip
Use the following command to monitor the logs of the head node:

1. Create model deployment spec files.
In this example, model "opt-2.7b" will be placed on server 0; while model "opt-1.3b" will be placed on server 1.
```bash
echo '{
"model": "opt-2.7b",
"backend": "transformers",
"num_gpus": 1,
"auto_scaling_config": {
"metric": "concurrency",
"target": 1,
"min_instances": 0,
"max_instances": 10
},
"placement_config": {
"target_nodes": ["0"]
},
"backend_config": {
"pretrained_model_name_or_path": "facebook/opt-2.7b",
"device_map": "auto",
"torch_dtype": "float16"
}
}' > config-opt-2.7b.json
echo '{
"model": "opt-1.3b",
"backend": "transformers",
"num_gpus": 1,
"auto_scaling_config": {
"metric": "concurrency",
"target": 1,
"min_instances": 0,
"max_instances": 10
},
"placement_config": {
"target_nodes": ["1"]
},
"backend_config": {
"pretrained_model_name_or_path": "facebook/opt-1.3b",
"device_map": "auto",
"torch_dtype": "float16"
}
}' > config-opt-1.3b.json
docker logs -f sllm_head
```
:::

### Step 5: Deploy Models with Placement Spec

In the `examples/storage_aware_scheduling` directory, the example configuration files (`config-opt-2.7b.json` and `config-opt-1.3b.json`) are already given.

> Note: Storage aware scheduling currently only supports "transformers" backend. Support for other backends will come soon.
> Note: Storage aware scheduling currently only supports the "transformers" backend. Support for other backends will come soon.
2. Deploy models with the placement spec files.

Expand All @@ -148,7 +80,7 @@ sllm-cli deploy --config config-opt-1.3b.json
curl http://127.0.0.1:8343/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "opt-2.7b",
"model": "facebook/opt-2.7b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is your name?"}
Expand All @@ -158,26 +90,24 @@ curl http://127.0.0.1:8343/v1/chat/completions \
curl http://127.0.0.1:8343/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "opt-1.3b",
"model": "facebook/opt-1.3b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is your name?"}
]
}'
```

As shown in the log message, the model "opt-2.7b" is scheduled on server 0, while the model "opt-1.3b" is scheduled on server 1.
```plaintext
...
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:40 storage_aware_scheduler.py:138] Sorted scheduling options: [('0', 0.9877967834472656)]
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:40 storage_aware_scheduler.py:145] Allocated node 0 for model opt-2.7b
...
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:51 storage_aware_scheduler.py:138] Sorted scheduling options: [('1', 0.4901580810546875)]
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:51 storage_aware_scheduler.py:145] Allocated node 1 for model opt-1.3b
...
As shown in the log message, the model "facebook/opt-2.7b" is scheduled on server 0, while the model "facebook/opt-1.3b" is scheduled on server 1.

```log
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:27 storage_aware_scheduler.py:137] Sorted scheduling options: [('0', 4.583079601378258)]
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:27 storage_aware_scheduler.py:144] Allocated node 0 for model facebook/opt-2.7b
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:38 storage_aware_scheduler.py:137] Sorted scheduling options: [('1', 2.266678696047572)]
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:38 storage_aware_scheduler.py:144] Allocated node 1 for model facebook/opt-1.3b
```

### Step 4: Clean Up
### Step 6: Clean Up

Delete the model deployment by running the following command:

Expand All @@ -188,11 +118,6 @@ sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b
If you need to stop and remove the containers, you can use the following commands:

```bash
docker exec ray_head sh -c "ray stop"
docker exec ray_worker_0 sh -c "ray stop"
docker exec ray_worker_1 sh -c "ray stop"

docker stop ray_head ray_worker_0 ray_worker_1
docker rm ray_head ray_worker_0 ray_worker_1
docker network rm sllm
```
docker compose down
```

0 comments on commit 51bb18e

Please sign in to comment.