-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-image support for llama3.2 #705
base: habana_main
Are you sure you want to change the base?
Conversation
8477d17
to
08541ff
Compare
@kdamaszk @michalkuligowski @kzawora-intel, can you help review on this PR? Already tested on 11B and 90B. |
sampling_params = SamplingParams(top_p=0.99, top_k=self.vocab_size - 1) | ||
max_num_batched_tokens = self.scheduler_config.max_num_batched_tokens | ||
# Workaround to avoid unexpeced OOM failure during profile run | ||
max_num_seqs = int(self.scheduler_config.max_num_seqs/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this. Max prefill batch size is limited by self.max_num_prefill_seqs
. I believe the role of profile_run
is to generate with maximum possible batch size. If it is too big than user should limit it. Such hacks will not prevent from actual OOM in the middle of benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand your point. This is a temporary work around. For multi-modal models, this configurations not only affect the generated prefill prompts but also the encode dummy data, which will occupy a lot of memory during process before actually computed together with prefill and cause OOM. Current encode data process has some flaws and will be our next step to see how to improve it. Also, I have tested different configurations on benchmarks and not observed actual OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdamaszk I revert back to use previous logic but add some fixes for image tokens. Can you help merge this PR if there is no other big issue? QA need this PR merged to launch test. I can address in my later PR if there are some minor comments.
c9bfe70
to
7223fb8
Compare
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
No description provided.