forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Config hidden layer number to run in 1 lazy graph (#451)
FILL IN THE PR DESCRIPTION HERE Some models is hardcoded with running each hidden layer in computation graph for lazy mode when TP =1 . For some use case that is limited by TPOT, we can't run higher batch, we want to increase hidden layer to have more efficient computation. Use VLLM_CONFIG_HIDDEN_LAYER to config the layers to run. Default to 1.
- Loading branch information
Showing
4 changed files
with
14 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters