Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved ALIBI bias regression due to porting flat PA #34

Merged

Conversation

tannervoas742
Copy link
Contributor

@tannervoas742 tannervoas742 commented Nov 15, 2024

Required for associated changes on vllm-fork PR

See the associated ticket above for further details.

@tannervoas742 tannervoas742 changed the title ALIBI-Ext: Works in lazy and eager mode Resolved ALIBI bias regression due to porting flat PA Nov 15, 2024
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from 873fc19 to 1a652d3 Compare November 18, 2024 10:34
@xuguangxin
Copy link

@madamczykhabana, @libinta please help review.
thank you

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 4 times, most recently from 67b3396 to c0fb257 Compare November 27, 2024 03:10
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from d97b251 to 288b7c7 Compare December 9, 2024 17:03
Changes:
- Added back alibi biases to decode stage.
- Optimized ALiBI memory usage.
  - Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow
    large models to run with restricted prompt lengths.
  - Prompt biases instantiated once in __init__ rather than each
    forward.
  - Prompt and decode biases are shared across encoder/decoder layers.
- Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve
  accuracy issue on long sequences.
- Updated jais, mpt, falcon, baichuan, and bloom to work with ALiBI.
  - Due to bloom's 176B parameter size I was unable to test this model.
    Its changes are the simplest though.
- Works in lazy and eager mode.
- ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and
  "VLLM_CONTIGUOUS_PA=true".
- Add position offsets to improve quality on BS > 1 with sequences of
  varying length.
- BS > 1 may have accuracy issues if on FW < 1.19.0. This is due to
  limitation in softmax. Resolved on FW >= 1.19.0.
- NTT patch for GQA

Co-authored-by: Tanner Voas <[email protected]>
Co-authored-by: Haihao Xiang <[email protected]>
Signed-off-by: Tanner Voas <[email protected]>
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 288b7c7 to de937c2 Compare December 10, 2024 16:16
@michalkuligowski michalkuligowski merged commit 0766759 into HabanaAI:main Dec 12, 2024
michalkuligowski added a commit that referenced this pull request Dec 13, 2024
michalkuligowski added a commit that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants