Enabled and optimized GLM-4v-9b on Gaudi #691

gyou2021 · 2025-01-16T05:54:23Z

Enabled the multimodal model GLM-4v-9b on Gaudi.
Optimized the model:
1. Removed graph recompile caused by the image and different batch size;
2. Reduced tokenization times to once.

Example:
python examples/offline_inference_vision_language.py -m glm4v

jikunshang · 2025-01-26T00:35:47Z

vllm/model_executor/models/glm4_vision_encoder.py

-        output, _ = self.dense(out)
+
+        if is_hpu:
+            q = q.reshape(B, L, self.num_heads_per_rank,


define a HPUMultiHeadAttention to process this

Done. Thank you for your comments.

jikunshang · 2025-01-26T00:57:04Z

vllm/model_executor/models/chatglm.py

@@ -1,10 +1,10 @@
 # Adapted from


Please revert file mode to 100644, same for other files.

jikunshang · 2025-01-26T00:59:55Z

vllm/worker/hpu_model_runner.py

@@ -963,7 +961,18 @@ def _prepare_prompt(
                                                   pad=0,
                                                   dtype=torch.long,
                                                   device='cpu')
-
+        if len(multi_modal_kwargs_list


@yma11 Please take a look

gyou2021 added 4 commits January 16, 2025 03:47

Enabled and optimized GLM-4V-9b

0208526

Fixed conflicts of cherry-pick Enabled and optimized GLM-4V-9b

442d2c3

Fixed the graph recompile problem caused by different batch size

299d5c5

Merge branch 'HabanaAI:habana_main' into glm4v

ee33985

gyou2021 requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners January 16, 2025 05:54

gyou2021 mentioned this pull request Jan 16, 2025

Enabled and optimized the multimodal model GLM-4v-9b on Gaudi #676

Closed

michalkuligowski requested a review from jikunshang January 24, 2025 12:50

jikunshang reviewed Jan 26, 2025

View reviewed changes

Added HPUMultiHeadAttention to refine the code

7facc7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabled and optimized GLM-4v-9b on Gaudi #691

Enabled and optimized GLM-4v-9b on Gaudi #691

gyou2021 commented Jan 16, 2025 •

edited by github-actions bot

Loading

jikunshang Jan 26, 2025

gyou2021 Jan 26, 2025

jikunshang Jan 26, 2025

jikunshang Jan 26, 2025

Enabled and optimized GLM-4v-9b on Gaudi #691

Are you sure you want to change the base?

Enabled and optimized GLM-4v-9b on Gaudi #691

Conversation

gyou2021 commented Jan 16, 2025 • edited by github-actions bot Loading

jikunshang Jan 26, 2025

Choose a reason for hiding this comment

gyou2021 Jan 26, 2025

Choose a reason for hiding this comment

jikunshang Jan 26, 2025

Choose a reason for hiding this comment

jikunshang Jan 26, 2025

Choose a reason for hiding this comment

gyou2021 commented Jan 16, 2025 •

edited by github-actions bot

Loading