-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smol improvements to support more flexible usage #34857
smol improvements to support more flexible usage #34857
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
thanks @andimarafioti LGTM, I can load the dataset nicely now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andimarafioti! Thanks for the update, just a note regarding converting to RGB
# Extra channel dimension for grayscale images | ||
if input_data_format in [ChannelDimension.LAST, None]: | ||
images_list = [ | ||
[np.expand_dims(img, axis=-1) if img.ndim == 2 else img for img in images] for images in images_list | ||
] | ||
elif input_data_format == ChannelDimension.FIRST: | ||
images_list = [ | ||
[np.expand_dims(img, axis=0) if img.ndim == 2 else img for img in images] for images in images_list | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a method that converts PIL images to RGB format (see do_convert_rgb
param in other image processors). It's better to use it + extend to numpy arrays if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I know, the order of the conversion here is actually important as RGB conversion before image_splitting was hurting the model performance quite a bit D:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is why I'm not converting to RGB until later in the pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I know, the order of the conversion here is actually important as RGB conversion before image_splitting was hurting the model performance quite a bit D:
haha, that's interesting, do you have any clarification for such a behavior? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My observation is that the order here makes the resulting images are slightly different (if you just plot the differences you see large values on edges in the images). They look the same to me, but clearly not to the model since it was trained with a different pipeline that more closely resembles this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it, thanks for clarifying 🤗
@andimarafioti LGTM: I reran vlmevalkit with the current changes and I can confirm there is no regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that RGB better be done if do_convert_rbg
by updating the helper function from utils, otherwise LGTM
To clarify, here I'm not changing where or if RGB conversion happens. |
Yeah, the idea is that we don’t actually need this code because the image will be converted to RGB. However, since maintaining this order is important for quality, it seems fine to me. |
What does this PR do?
This PR improves a few smol issues we had with Idefics 3:
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?