make both TP overlap splitting by batch and seq work together #630

xuechendi · 2024-12-14T01:38:53Z

VLLM_TP_SPLIT_SIZE_BY_SEQ=2
VLLM_TP_SPLIT_SIZE_BY_BATCH=2

xuechendi · 2024-12-14T01:41:04Z

@jikunshang , please take a review.
when both set 2, we will only pick one strategy: I found use both hurt the performance
Will try split_by batch firstly, since it improves performance more
if split_by_batch is performed, hint to disable split_by_seq; otherwise split by sequence

xuechendi · 2024-12-14T01:42:01Z

One lacking issue, this PR revert Prepare_cos_sin => which shows slightly performance regression, need to fix that before merge to mlperf branch

jikunshang · 2024-12-16T10:37:20Z

LGTM.

Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Chendi Xue <[email protected]>

xuechendi requested review from kzawora-intel, madamczykhabana, michalkuligowski and mgawarkiewicz as code owners December 14, 2024 01:38

xuechendi marked this pull request as draft December 14, 2024 01:41

xuechendi force-pushed the mlperf_features branch 2 times, most recently from 4169588 to d6bdc90 Compare December 16, 2024 17:31

Support tensor split in llama decoderlayer

5eef6d9

Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Chendi Xue <[email protected]>

xuechendi marked this pull request as ready for review December 16, 2024 18:48

xuechendi force-pushed the tp_parallelism_2_rebased branch from f16a4fb to 5eef6d9 Compare December 16, 2024 18:49

xuechendi merged commit a23e1a1 into HabanaAI:mlperf_features Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make both TP overlap splitting by batch and seq work together #630

make both TP overlap splitting by batch and seq work together #630

xuechendi commented Dec 14, 2024 •

edited by github-actions bot

Loading

xuechendi commented Dec 14, 2024

xuechendi commented Dec 14, 2024

jikunshang commented Dec 16, 2024

make both TP overlap splitting by batch and seq work together #630

make both TP overlap splitting by batch and seq work together #630

Conversation

xuechendi commented Dec 14, 2024 • edited by github-actions bot Loading

xuechendi commented Dec 14, 2024

xuechendi commented Dec 14, 2024

jikunshang commented Dec 16, 2024

xuechendi commented Dec 14, 2024 •

edited by github-actions bot

Loading