-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Add fused_moe_triton benchmark and tuning tools #2225
Conversation
fix chunked prefill size defualt value in GTX 4090
@merrymercy The pr need review, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BBuf
Thanks for your great work!
Can you also refer to https://github.com/sgl-project/sglang/blob/main/3rdparty/amd/tuning/benchmark_moe_rocm.py
?
One goal is to come up with an uniformed tuning scripts.
But if you can't do in this PR due to access limit, probably fine, we can try to unify later too.
Alright, looking at the diffs between the two tuning scripts, there seem to be much differences. If you want to unify them, you can work on that. Thanks! |
benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
Outdated
Show resolved
Hide resolved
benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
Outdated
Show resolved
Hide resolved
benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
Show resolved
Hide resolved
@HaiShaw I had fix your comment, have another look when you are free, thanks. |
benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
Outdated
Show resolved
Hide resolved
@BBuf Thanks for your PR! |
@merrymercy please have a look too. |
The tuning scripts is related to this pr, and it successfull produced efficient
fused_moe_triton
kernel config for qwen2-57b and mixtral 8x7b in both tp4fp8_w8a8
condition.In GTX 4090, I have checked benchmark script in
Qwen/Qwen2-57B-A14B-Instruct-FP8
model'sfused_moe_triton
kernel, the result: