Resolved ALIBI bias regression due to porting flat PA #34

tannervoas742 · 2024-11-15T05:40:35Z

Required for associated changes on vllm-fork PR

See the associated ticket above for further details.

xuguangxin · 2024-11-19T06:18:43Z

@madamczykhabana, @libinta please help review.
thank you

Changes: - Added back alibi biases to decode stage. - Optimized ALiBI memory usage. - Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow large models to run with restricted prompt lengths. - Prompt biases instantiated once in __init__ rather than each forward. - Prompt and decode biases are shared across encoder/decoder layers. - Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve accuracy issue on long sequences. - Updated jais, mpt, falcon, baichuan, and bloom to work with ALiBI. - Due to bloom's 176B parameter size I was unable to test this model. Its changes are the simplest though. - Works in lazy and eager mode. - ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and "VLLM_CONTIGUOUS_PA=true". - Add position offsets to improve quality on BS > 1 with sequences of varying length. - BS > 1 may have accuracy issues if on FW < 1.19.0. This is due to limitation in softmax. Resolved on FW >= 1.19.0. - NTT patch for GQA Co-authored-by: Tanner Voas <[email protected]> Co-authored-by: Haihao Xiang <[email protected]> Signed-off-by: Tanner Voas <[email protected]>

This reverts commit 0766759.

tannervoas742 mentioned this pull request Nov 15, 2024

Resolved ALIBI bias regression due to porting flat PA HabanaAI/vllm-fork#503

Open

tannervoas742 changed the title ~~ALIBI-Ext: Works in lazy and eager mode~~ Resolved ALIBI bias regression due to porting flat PA Nov 15, 2024

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from 873fc19 to 1a652d3 Compare November 18, 2024 10:34

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 4 times, most recently from 67b3396 to c0fb257 Compare November 27, 2024 03:10

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from d97b251 to 288b7c7 Compare December 9, 2024 17:03

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 288b7c7 to de937c2 Compare December 10, 2024 16:16

itaraban approved these changes Dec 12, 2024

View reviewed changes

michalkuligowski merged commit 0766759 into HabanaAI:main Dec 12, 2024

michalkuligowski added a commit that referenced this pull request Dec 13, 2024

Revert "vLLM-Ext: Full enabling of ALiBi (#34)"

3f5d0e1

This reverts commit 0766759.

michalkuligowski added a commit that referenced this pull request Dec 13, 2024

Revert "vLLM-Ext: Full enabling of ALiBi (#34)" (#59)

11b8d9d

This reverts commit 0766759.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolved ALIBI bias regression due to porting flat PA #34

Resolved ALIBI bias regression due to porting flat PA #34

tannervoas742 commented Nov 15, 2024 •

edited

Loading

xuguangxin commented Nov 19, 2024

Resolved ALIBI bias regression due to porting flat PA #34

Resolved ALIBI bias regression due to porting flat PA #34

Conversation

tannervoas742 commented Nov 15, 2024 • edited Loading

xuguangxin commented Nov 19, 2024

tannervoas742 commented Nov 15, 2024 •

edited

Loading