Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmark] Add fused_moe_triton benchmark and tuning tools #2225

Merged
merged 37 commits into from
Nov 29, 2024

Conversation

BBuf
Copy link
Contributor

@BBuf BBuf commented Nov 27, 2024

The tuning scripts is related to this pr, and it successfull produced efficient fused_moe_triton kernel config for qwen2-57b and mixtral 8x7b in both tp4 fp8_w8a8 condition.

In GTX 4090, I have checked benchmark script in Qwen/Qwen2-57B-A14B-Instruct-FP8 model's fused_moe_triton kernel, the result:

python benchmark/kernels/fused_moe_triton/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py --model /mnt/bbuf/Qwen2-57B-A14B-Instruct-FP8 --tp-size 4

图片

python benchmark/kernels/fused_moe_triton/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py --model /mnt/bbuf/Qwen2-57B-A14B-Instruct-FP8 --tp-size 4 --use-fp8

图片

@BBuf
Copy link
Contributor Author

BBuf commented Nov 29, 2024

@merrymercy The pr need review, thanks.

Copy link
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BBuf
Thanks for your great work!
Can you also refer to https://github.com/sgl-project/sglang/blob/main/3rdparty/amd/tuning/benchmark_moe_rocm.py?
One goal is to come up with an uniformed tuning scripts.
But if you can't do in this PR due to access limit, probably fine, we can try to unify later too.

python/sglang/srt/server_args.py Outdated Show resolved Hide resolved
@BBuf
Copy link
Contributor Author

BBuf commented Nov 29, 2024

@BBuf Thanks for your great work! Can you also refer to https://github.com/sgl-project/sglang/blob/main/3rdparty/amd/tuning/benchmark_moe_rocm.py? One goal is to come up with an uniformed tuning scripts. But if you can't do in this PR due to access limit, probably fine, we can try to unify later too.

Alright, looking at the diffs between the two tuning scripts, there seem to be much differences. If you want to unify them, you can work on that. Thanks!

@BBuf
Copy link
Contributor Author

BBuf commented Nov 29, 2024

@HaiShaw I had fix your comment, have another look when you are free, thanks.

@HaiShaw
Copy link
Collaborator

HaiShaw commented Nov 29, 2024

@BBuf Thanks for your PR!

@HaiShaw
Copy link
Collaborator

HaiShaw commented Nov 29, 2024

@merrymercy please have a look too.

@HaiShaw HaiShaw merged commit 262e370 into sgl-project:main Nov 29, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants