Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-image support for llama3.2 #705

Open
wants to merge 6 commits into
base: habana_main
Choose a base branch
from

Conversation

yma11
Copy link

@yma11 yma11 commented Jan 20, 2025

No description provided.

@yma11
Copy link
Author

yma11 commented Jan 24, 2025

@kdamaszk @michalkuligowski @kzawora-intel, can you help review on this PR? Already tested on 11B and 90B.

sampling_params = SamplingParams(top_p=0.99, top_k=self.vocab_size - 1)
max_num_batched_tokens = self.scheduler_config.max_num_batched_tokens
# Workaround to avoid unexpeced OOM failure during profile run
max_num_seqs = int(self.scheduler_config.max_num_seqs/2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. Max prefill batch size is limited by self.max_num_prefill_seqs. I believe the role of profile_run is to generate with maximum possible batch size. If it is too big than user should limit it. Such hacks will not prevent from actual OOM in the middle of benchmark.

Copy link
Author

@yma11 yma11 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand your point. This is a temporary work around. For multi-modal models, this configurations not only affect the generated prefill prompts but also the encode dummy data, which will occupy a lot of memory during process before actually computed together with prefill and cause OOM. Current encode data process has some flaws and will be our next step to see how to improve it. Also, I have tested different configurations on benchmarks and not observed actual OOM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdamaszk I revert back to use previous logic but add some fixes for image tokens. Can you help merge this PR if there is no other big issue? QA need this PR merged to launch test. I can address in my later PR if there are some minor comments.

@yma11 yma11 force-pushed the multi-image branch 3 times, most recently from c9bfe70 to 7223fb8 Compare January 26, 2025 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants