Add Baseline for SGLang Benchmark Test #602

stbaione · 2024-11-25T14:45:08Z

Description

The SGLang Benchmark Test has been running for awhile, but only benchmarks the shortfin server itself. In order to get a baseline metric and enable long-term convergence in-terms of performance, we need to be able to track metrics of the SGLang server using the same benchmark method.

This adds an sglang_benchmark_test to complement the shortfin_benchmark_test. Also restructures app_tests/benchmark_tests/llm -> app_tests/benchmark_tests/llm/sglang_benchmarks. This keeps the benchmark tests organized and allows for the folder to be extended with other types of benchmarks in the future.

Why are we using docker to start the SGLang server?

Currently, the pyprompt.toml file inside of SGLang requires vllm==0.6.3.dev13 to run on ROCm. I looked into potentially building vLLM from source for this test, but couldn't find a branch, tag, or release that matched that signature. From their own comments inside of pyproject.toml, it appears to only be available inside of a ROCm base image:

# HIP (Heterogeneous-computing Interface for Portability) for AMD
# => base docker rocm/vllm-dev:20241022, not from public vllm whl
srt_hip = ["sglang[runtime_common]", "torch", "vllm==0.6.3.dev13"]

Their instructions on installing SGLang and running for ROCm also appear to suggest the docker method:

Instructions from their docs for running with ROCm

docker build --build-arg SGL_BRANCH=v0.3.5.post2 -t v0.3.5.post2-rocm620 -f Dockerfile.rocm .

alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \
    --shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
    -v $HOME/dockerx:/dockerx -v /data:/data'

drun -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    v0.3.5.post2-rocm620 \
    python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000

The workflow file handles starting the container and cleaning up once the workflow is done. I set the timeout for waiting for the server to start to 10 minutes to give the SGLang server enough time to load necessary model weights and startup.

Add sgl server benchmark to workflow file, Restructure `app_tests/benchmark_tests`

Temporarily comment out shortfin job to verify sglang benchmark job

Update benchmark tests to download model on demand

…om shortfin/sharktank

Add disable-cuda-graph option to allow server to properly run

…stbaione/sgl-benchmark-add-baseline

…but differing answers to be accepted

renxida · 2024-12-02T21:17:32Z

Link to successful run:

With SRT (Sglang RunTime):
https://github.com/nod-ai/shark-ai/actions/runs/12126822859/job/33812266791

With Shortfin:
https://github.com/nod-ai/shark-ai/actions/runs/12126822859/job/33812266454

renxida

Looks good to me! Would be nice to get @ScottTodd 's look too if he's got time.

Add back step to clean up docker image

…gh-pages

.github/workflows/ci-sglang-benchmark.yml

Always use python3.11 for merging reports, Make merging reports one step, Temporarily enable PR trigger for validation

… but are dependent on each other's success

Make `merge_and_upload_reports` run conditionally on either succeeding

…ithub.com/nod-ai/shark-ai into users/stbaione/sgl-benchmark-add-baseline

# Description The SGLang Benchmark Test has been running for awhile, but only benchmarks the shortfin server itself. In order to get a baseline metric and enable long-term convergence in-terms of performance, we need to be able to track metrics of the SGLang server using the same benchmark method. This adds an `sglang_benchmark_test` to complement the `shortfin_benchmark_test`. Also restructures `app_tests/benchmark_tests/llm` -> `app_tests/benchmark_tests/llm/sglang_benchmarks`. This keeps the benchmark tests organized and allows for the folder to be extended with other types of benchmarks in the future. # Why are we using docker to start the SGLang server? Currently, the pyprompt.toml file inside of SGLang requires `vllm==0.6.3.dev13` to run on ROCm. I looked into potentially building vLLM from source for this test, but couldn't find a branch, tag, or release that matched that signature. From their own comments inside of `pyproject.toml`, it appears to only be available inside of a `ROCm` base image: ```toml # HIP (Heterogeneous-computing Interface for Portability) for AMD # => base docker rocm/vllm-dev:20241022, not from public vllm whl srt_hip = ["sglang[runtime_common]", "torch", "vllm==0.6.3.dev13"] ``` Their [instructions](https://sgl-project.github.io/start/install.html#method-3-using-docker) on installing SGLang and running for ROCm also appear to suggest the docker method: ## Instructions from their docs for running with ROCm ``` docker build --build-arg SGL_BRANCH=v0.3.5.post2 -t v0.3.5.post2-rocm620 -f Dockerfile.rocm . alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \ --shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ -v $HOME/dockerx:/dockerx -v /data:/data' drun -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ v0.3.5.post2-rocm620 \ python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000 ``` The workflow file handles starting the container and cleaning up once the workflow is done. I set the timeout for waiting for the server to start to `10 minutes` to give the SGLang server enough time to load necessary model weights and startup.

stbaione and others added 20 commits November 22, 2024 01:12

Add benchmark using sglang server,

3c21be0

Add sgl server benchmark to workflow file, Restructure `app_tests/benchmark_tests`

Fix import path in shortfin_benchmark_test,

31398a5

Temporarily comment out shortfin job to verify sglang benchmark job

Merge branch 'main' into sgl-benchmark-add-baseline

4d0323f

Change ci-sglang-benchmark/integration to use mi300x-4,

fc78284

Update benchmark tests to download model on demand

Fix github runner label

0909e8f

Add installation steps, since test does require some functionality fr…

d7cc539

…om shortfin/sharktank

Fix typo in model names

cf16e54

Add container name,

86058b8

Add disable-cuda-graph option to allow server to properly run

Temporarily remove --rm to try and obtain container logs after failure

acbedb0

Remove quotes around HF_TOKEN

34c8410

Try using env var for HF_SECRET

0d5574d

Move secrets.HF_TOKEN back to command

c9f4d33

Add temporary command to see if HF_TOKEN is being set properly

4fa094c

Add back command to rm container once stopped

c33ef75

Merge branch 'main' of https://github.com/nod-ai/shark-ai into users/…

6986765

…stbaione/sgl-benchmark-add-baseline

Allow for full e2e verification

fea2655

Update hash for pip cache in benchmark and integration tests

3641445

Remove version pinning for iree-base-compiler and iree-base-runtime

d82d9df

Add --pre to iree installations in SGLang tests

e843281

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

7fe76d2

stbaione mentioned this pull request Nov 26, 2024

[sharktank] Update shark-ai CIs with latest install #609

Closed

stbaione and others added 4 commits December 2, 2024 09:55

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

ea65936

Slightly lower threshold in integration tests, to allow still valid, …

01da13c

…but differing answers to be accepted

Fix publish_dir in Deploy to Github Pages step

ed37ef1

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

d1e434f

stbaione marked this pull request as ready for review December 2, 2024 21:00

stbaione requested review from renxida and ScottTodd December 2, 2024 21:00

renxida approved these changes Dec 2, 2024

View reviewed changes

stbaione added 7 commits December 2, 2024 23:07

Remove temporary disablements,

09e0fb6

Add back step to clean up docker image

Remove Get Current Date step in shortfin benchmark job

422729f

Add README description to top of CI file

969b608

Add job to merge html reports from both benchmark jobs and upload to …

f67e399

…gh-pages

Fix upload/download paths

6909edc

Split download into two steps

aa35176

Ensure all html files are in same dir

9578acc

stbaione requested a review from ScottTodd December 3, 2024 16:56

Remove PR trigger

b9b9ea5

ScottTodd reviewed Dec 3, 2024

View reviewed changes

stbaione added 4 commits December 3, 2024 19:56

Remove sharktank installation from SGLang benchmark,

526194f

Always use python3.11 for merging reports, Make merging reports one step, Temporarily enable PR trigger for validation

Use hf to download tokenizer in sglang_benchmark_test

57babdf

Small cleanup of sglang ci deps section

8f7f0fb

Make shortfin/sglang benchmark such that they still run sequentially,…

305c4b0

… but are dependent on each other's success

stbaione requested a review from ScottTodd December 3, 2024 21:56

stbaione and others added 8 commits December 3, 2024 22:19

Remove dep on shortfin benchmark in sgl benchmark,

d78ab73

Make `merge_and_upload_reports` run conditionally on either succeeding

Make sure merge_and_upload_reports waits for prior jobs to finish

29a8221

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

35960e0

Move code checkout to first step in benchmark_sglang

7efecb8

Merge branch 'users/stbaione/sgl-benchmark-add-baseline' of https://g…

d28bf01

…ithub.com/nod-ai/shark-ai into users/stbaione/sgl-benchmark-add-baseline

Remove PR trigger

1e90573

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

3f6564f

Repin iree-base-compiler and iree-base-runtime due to abort issue

b1ec485

stbaione mentioned this pull request Dec 4, 2024

[shortfin] Bump IREE dep to iree-3.1.0rc20241204. #635

Merged

stbaione added 2 commits December 4, 2024 10:46

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

56d3a5c

Merge branch 'main' into users/stbaione/sgl-benchmark-add-baseline

5d406c8

ScottTodd approved these changes Dec 4, 2024

View reviewed changes

stbaione merged commit fc22312 into main Dec 4, 2024
8 checks passed

stbaione deleted the users/stbaione/sgl-benchmark-add-baseline branch December 4, 2024 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Baseline for SGLang Benchmark Test #602

Add Baseline for SGLang Benchmark Test #602

stbaione commented Nov 25, 2024

renxida commented Dec 2, 2024

renxida left a comment

Add Baseline for SGLang Benchmark Test #602

Add Baseline for SGLang Benchmark Test #602

Conversation

stbaione commented Nov 25, 2024

Description

Why are we using docker to start the SGLang server?

Instructions from their docs for running with ROCm

renxida commented Dec 2, 2024

renxida left a comment

Choose a reason for hiding this comment