multi-image support for llama3.2 #705

yma11 · 2025-01-20T09:13:08Z

No description provided.

yma11 · 2025-01-24T05:03:17Z

@kdamaszk @michalkuligowski @kzawora-intel, can you help review on this PR? Already tested on 11B and 90B.

vllm/worker/hpu_enc_dec_model_runner.py

kdamaszk · 2025-01-24T08:58:35Z

vllm/worker/hpu_enc_dec_model_runner.py

+        sampling_params = SamplingParams(top_p=0.99, top_k=self.vocab_size - 1)
+        max_num_batched_tokens = self.scheduler_config.max_num_batched_tokens
+        # Workaround to avoid unexpeced OOM failure during profile run
+        max_num_seqs = int(self.scheduler_config.max_num_seqs/2)


I don't like this. Max prefill batch size is limited by self.max_num_prefill_seqs. I believe the role of profile_run is to generate with maximum possible batch size. If it is too big than user should limit it. Such hacks will not prevent from actual OOM in the middle of benchmark.

Understand your point. This is a temporary work around. For multi-modal models, this configurations not only affect the generated prefill prompts but also the encode dummy data, which will occupy a lot of memory during process before actually computed together with prefill and cause OOM. Current encode data process has some flaws and will be our next step to see how to improve it. Also, I have tested different configurations on benchmarks and not observed actual OOM.

@kdamaszk I revert back to use previous logic but add some fixes for image tokens. Can you help merge this PR if there is no other big issue? QA need this PR merged to launch test. I can address in my later PR if there are some minor comments.

Signed-off-by: yan ma <[email protected]>

yma11 requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners January 20, 2025 09:13

yma11 force-pushed the multi-image branch 3 times, most recently from 8477d17 to 08541ff Compare January 24, 2025 01:58

kdamaszk reviewed Jan 24, 2025

View reviewed changes

vllm/worker/hpu_enc_dec_model_runner.py Outdated Show resolved Hide resolved

kdamaszk reviewed Jan 24, 2025

View reviewed changes

yma11 force-pushed the multi-image branch 3 times, most recently from c9bfe70 to 7223fb8 Compare January 26, 2025 07:38

yma11 added 6 commits January 26, 2025 17:29

multi-image support for llama3.2

4cb427c

Signed-off-by: yan ma <[email protected]>

fix

b445532

Signed-off-by: yan ma <[email protected]>

fix

8e9a262

fix profile_run

0b530f2

Signed-off-by: yan ma <[email protected]>

fix

020968f

revert

7223fb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-image support for llama3.2 #705

multi-image support for llama3.2 #705

yma11 commented Jan 20, 2025 •

edited by github-actions bot

Loading

yma11 commented Jan 24, 2025

kdamaszk Jan 24, 2025

yma11 Jan 24, 2025 •

edited

Loading

yma11 Jan 27, 2025

multi-image support for llama3.2 #705

Are you sure you want to change the base?

multi-image support for llama3.2 #705

Conversation

yma11 commented Jan 20, 2025 • edited by github-actions bot Loading

yma11 commented Jan 24, 2025

kdamaszk Jan 24, 2025

Choose a reason for hiding this comment

yma11 Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

yma11 Jan 27, 2025

Choose a reason for hiding this comment

yma11 commented Jan 20, 2025 •

edited by github-actions bot

Loading

yma11 Jan 24, 2025 •

edited

Loading