[benchmark] Add fused_moe_triton benchmark and tuning tools #2225

BBuf · 2024-11-27T15:41:56Z

The tuning scripts is related to this pr, and it successfull produced efficient fused_moe_triton kernel config for qwen2-57b and mixtral 8x7b in both tp4 fp8_w8a8 condition.

In GTX 4090, I have checked benchmark script in Qwen/Qwen2-57B-A14B-Instruct-FP8 model's fused_moe_triton kernel, the result:

python benchmark/kernels/fused_moe_triton/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py --model /mnt/bbuf/Qwen2-57B-A14B-Instruct-FP8 --tp-size 4

python benchmark/kernels/fused_moe_triton/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py --model /mnt/bbuf/Qwen2-57B-A14B-Instruct-FP8 --tp-size 4 --use-fp8

benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py

lint

fix chunked prefill size defualt value in GTX 4090

format

python/sglang/srt/server_args.py

BBuf · 2024-11-29T01:10:43Z

@merrymercy The pr need review, thanks.

HaiShaw

@BBuf
Thanks for your great work!
Can you also refer to https://github.com/sgl-project/sglang/blob/main/3rdparty/amd/tuning/benchmark_moe_rocm.py?
One goal is to come up with an uniformed tuning scripts.
But if you can't do in this PR due to access limit, probably fine, we can try to unify later too.

python/sglang/srt/server_args.py

BBuf · 2024-11-29T06:25:11Z

@BBuf Thanks for your great work! Can you also refer to https://github.com/sgl-project/sglang/blob/main/3rdparty/amd/tuning/benchmark_moe_rocm.py? One goal is to come up with an uniformed tuning scripts. But if you can't do in this PR due to access limit, probably fine, we can try to unify later too.

Alright, looking at the diffs between the two tuning scripts, there seem to be much differences. If you want to unify them, you can work on that. Thanks!

benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py

benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py

python/sglang/srt/server_args.py

BBuf · 2024-11-29T09:02:33Z

@HaiShaw I had fix your comment, have another look when you are free, thanks.

benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py

python/sglang/srt/server_args.py

HaiShaw · 2024-11-29T10:05:25Z

@BBuf Thanks for your PR!

HaiShaw · 2024-11-29T11:03:50Z

@merrymercy please have a look too.

BBuf and others added 19 commits November 12, 2024 16:04

fix a bug in v1_embeeding_request

8c10a24

Merge branch 'sgl-project:main' into main

f2d6418

fix test_embedding_models prompt length too long's bug

9e3bb3d

fix format

97c029c

Merge branch 'sgl-project:main' into main

899b7f7

Merge branch 'sgl-project:main' into main

2afab09

fix a small typo in docs

6e6aec6

Merge branch 'main' into main

a5f4383

format backend.md

4865f98

Apply suggestions from code review

2fec3fe

Update docs/references/hyperparameter_tuning.md

b613bb6

Merge branch 'sgl-project:main' into main

b8dbac7

Merge branch 'sgl-project:main' into main

28be6f9

add tuning fused configs for qwen2 57b and mixtral 8x7b

39b3309

revert typo

ff9cd12

add fused_moe_triton benchmark and tuning tools

4ce5d4c

Merge branch 'sgl-project:main' into main

aa4dda0

delete useless comment

1240113

Merge branch 'main' of github.com:BBuf/sglang

9679714

BBuf commented Nov 27, 2024

View reviewed changes

benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py Show resolved Hide resolved

zhyncs mentioned this pull request Nov 27, 2024

[3rdparty, document] Updated Documentation that for triton fused_moe kernel tuning for AMD Instinct GPUs #2191

Merged

zhyncs assigned HaiShaw and ispobock Nov 27, 2024

zhyncs requested review from HaiShaw and ispobock November 27, 2024 17:49

BBuf and others added 5 commits November 28, 2024 09:38

A more generalized benchmark implementation

e4d502e

Delete useless file

1b68d31

refine benchmark_vllm_vs_sglang_fused_moe_triton.py output name

e40084b

lint

6c42078

Merge pull request #1 from BBuf/lint

adfc59f

lint

BBuf added 4 commits November 28, 2024 10:16

Merge branch 'main' into main

cebf16e

Merge branch 'sgl-project:main' into main

2946943

fix chunked prefill size defualt value in GTX 4090

12f55fd

Merge pull request #2 from BBuf/fix_default_chunked_prefill_size_4090

4e419ac

fix chunked prefill size defualt value in GTX 4090

BBuf requested review from merrymercy, Ying1123, hnyls2002, zhyncs and ByronHsu as code owners November 28, 2024 14:05

BBuf and others added 2 commits November 28, 2024 22:08

format

7a3e8e2

Merge pull request #3 from BBuf/fix_chunked_prefill_size_bug

569f006

format

BBuf commented Nov 29, 2024

View reviewed changes

python/sglang/srt/server_args.py Outdated Show resolved Hide resolved

HaiShaw requested changes Nov 29, 2024

View reviewed changes

python/sglang/srt/server_args.py Outdated Show resolved Hide resolved

more friendly commet

05bccf0

HaiShaw requested changes Nov 29, 2024

View reviewed changes

fix comment

d80745b

Lint fixed

a839d96

HaiShaw requested changes Nov 29, 2024

View reviewed changes

benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py Outdated Show resolved Hide resolved

python/sglang/srt/server_args.py Outdated Show resolved Hide resolved

fix init_type comment

9bf7bd4

HaiShaw approved these changes Nov 29, 2024

View reviewed changes

HaiShaw added 2 commits November 29, 2024 02:06

Merge branch 'main' into main

31296f1

Merge branch 'main' into main

a70cec9

Merge branch 'main' into main

238eb5c

HaiShaw merged commit 262e370 into sgl-project:main Nov 29, 2024
14 checks passed

merrymercy mentioned this pull request Nov 29, 2024

Fix the default chunked prefill size #2268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] Add fused_moe_triton benchmark and tuning tools #2225

[benchmark] Add fused_moe_triton benchmark and tuning tools #2225

BBuf commented Nov 27, 2024 •

edited

Loading

BBuf commented Nov 29, 2024

HaiShaw left a comment

BBuf commented Nov 29, 2024

BBuf commented Nov 29, 2024

HaiShaw commented Nov 29, 2024

HaiShaw commented Nov 29, 2024

[benchmark] Add fused_moe_triton benchmark and tuning tools #2225

[benchmark] Add fused_moe_triton benchmark and tuning tools #2225

Conversation

BBuf commented Nov 27, 2024 • edited Loading

BBuf commented Nov 29, 2024

HaiShaw left a comment

Choose a reason for hiding this comment

BBuf commented Nov 29, 2024

BBuf commented Nov 29, 2024

HaiShaw commented Nov 29, 2024

HaiShaw commented Nov 29, 2024

BBuf commented Nov 27, 2024 •

edited

Loading