Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make both TP overlap splitting by batch and seq work together #630

Merged

Conversation

xuechendi
Copy link

@xuechendi xuechendi commented Dec 14, 2024

VLLM_TP_SPLIT_SIZE_BY_SEQ=2
VLLM_TP_SPLIT_SIZE_BY_BATCH=2

@xuechendi
Copy link
Author

@jikunshang , please take a review.
when both set 2, we will only pick one strategy: I found use both hurt the performance
Will try split_by batch firstly, since it improves performance more
if split_by_batch is performed, hint to disable split_by_seq; otherwise split by sequence

@xuechendi xuechendi marked this pull request as draft December 14, 2024 01:41
@xuechendi
Copy link
Author

One lacking issue, this PR revert Prepare_cos_sin => which shows slightly performance regression, need to fix that before merge to mlperf branch

@jikunshang
Copy link

LGTM.

@xuechendi xuechendi force-pushed the mlperf_features branch 2 times, most recently from 4169588 to d6bdc90 Compare December 16, 2024 17:31
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Chendi Xue <[email protected]>
@xuechendi xuechendi marked this pull request as ready for review December 16, 2024 18:48
@xuechendi xuechendi force-pushed the tp_parallelism_2_rebased branch from f16a4fb to 5eef6d9 Compare December 16, 2024 18:49
@xuechendi xuechendi merged commit a23e1a1 into HabanaAI:mlperf_features Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants