Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.3.6.post2 #2214

Merged
merged 6 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docker/Dockerfile.rocm
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Usage (to build SGLang ROCm docker image):
# docker build --build-arg SGL_BRANCH=v0.3.6.post1 -t testImage -f Dockerfile.rocm .
# docker build --build-arg SGL_BRANCH=v0.3.6.post2 -t testImage -f Dockerfile.rocm .

# default base image
ARG BASE_IMAGE="rocm/vllm-dev:20241022"
Expand Down
4 changes: 2 additions & 2 deletions docs/developer/setup_github_runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ docker pull nvidia/cuda:12.1.1-devel-ubuntu22.04
# Nvidia
docker run --shm-size 128g -it -v /tmp/huggingface:/hf_home --gpus all nvidia/cuda:12.1.1-devel-ubuntu22.04 /bin/bash
# AMD
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.3.6.post1-rocm620 /bin/bash
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.3.6.post2-rocm620 /bin/bash
# AMD just the last 2 GPUs
docker run --rm --device=/dev/kfd --device=/dev/dri/renderD176 --device=/dev/dri/renderD184 --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.3.6.post1-rocm620 /bin/bash
docker run --rm --device=/dev/kfd --device=/dev/dri/renderD176 --device=/dev/dri/renderD184 --group-add video --shm-size 128g -it -v /tmp/huggingface:/hf_home lmsysorg/sglang:v0.3.6.post2-rocm620 /bin/bash
```

### Step 2: Configure the runner by `config.sh`
Expand Down
8 changes: 4 additions & 4 deletions docs/start/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Note: Please check the [FlashInfer installation doc](https://docs.flashinfer.ai/
## Method 2: From source
```
# Use the last release branch
git clone -b v0.3.6.post1 https://github.com/sgl-project/sglang.git
git clone -b v0.3.6.post2 https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
Expand Down Expand Up @@ -46,7 +46,7 @@ docker run --gpus all \
Note: To AMD ROCm system with Instinct/MI GPUs, it is recommended to use `docker/Dockerfile.rocm` to build images, example and usage as below:

```bash
docker build --build-arg SGL_BRANCH=v0.3.6.post1 -t v0.3.6.post1-rocm620 -f Dockerfile.rocm .
docker build --build-arg SGL_BRANCH=v0.3.6.post2 -t v0.3.6.post2-rocm620 -f Dockerfile.rocm .

alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \
--shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
Expand All @@ -55,11 +55,11 @@ alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/d
drun -p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
v0.3.6.post1-rocm620 \
v0.3.6.post2-rocm620 \
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000

# Till flashinfer backend available, --attention-backend triton --sampling-backend pytorch are set by default
drun v0.3.6.post1-rocm620 python3 -m sglang.bench_one_batch --batch-size 32 --input 1024 --output 128 --model amd/Meta-Llama-3.1-8B-Instruct-FP8-KV --tp 8 --quantization fp8
drun v0.3.6.post2-rocm620 python3 -m sglang.bench_one_batch --batch-size 32 --input 1024 --output 128 --model amd/Meta-Llama-3.1-8B-Instruct-FP8-KV --tp 8 --quantization fp8
```

## Method 4: Using docker compose
Expand Down
2 changes: 1 addition & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "sglang"
version = "0.3.6.post1"
version = "0.3.6.post2"
description = "SGLang is yet another fast serving framework for large language models and vision language models."
readme = "README.md"
requires-python = ">=3.8"
Expand Down
3 changes: 2 additions & 1 deletion python/sglang/bench_one_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -469,4 +469,5 @@ def main(server_args, bench_args):
except Exception as e:
raise e
finally:
kill_child_process()
if server_args.tp_size != 1:
kill_child_process()
5 changes: 5 additions & 0 deletions python/sglang/srt/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,11 @@ def monkey_patch_vllm_p2p_access_check(gpu_id: int):

setattr(tgt, "gpu_p2p_access_check", lambda *arg, **kwargs: True)

# Suppress the warnings from this delete function when using sglang.bench_one_batch
from vllm.distributed.device_communicators.custom_all_reduce import CustomAllreduce

setattr(CustomAllreduce, "__del__", lambda *args, **kwargs: None)


vllm_all_gather_backup = None

Expand Down
2 changes: 1 addition & 1 deletion python/sglang/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.3.6.post1"
__version__ = "0.3.6.post2"
3 changes: 0 additions & 3 deletions test/srt/test_nightly_human_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import signal
import subprocess
import unittest
from types import SimpleNamespace

from test_nightly_gsm8k_eval import launch_server, parse_models

Expand All @@ -13,9 +12,7 @@
DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_FP8_TP2,
DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_TP1,
DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_TP2,
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
DEFAULT_URL_FOR_TEST,
popen_launch_server,
)


Expand Down
Loading