smol improvements to support more flexible usage #34857

andimarafioti · 2024-11-21T14:15:38Z

What does this PR do?

This PR improves a few smol issues we had with Idefics 3:

We couldn't use the model with images larger than 5*364. This was the default max_image_size. The method where this was computed took a parameter as input, but it was never used. It would also raise an error if we wanted to resize to a larger size. I changed this for a default value of 4k resolution, as this is already considerably larger than what we trained on, ie, anything larger is pretty outrageous.
We couldn't train with datasets that contained grayscale images since the input_data_format wasn't properly parsed. I fixed this by switching around the processing order. Now, if the images are grayscale, I add a channel to the end or start of the images. Then, the input_data_format can be correctly inferred if it is none.
Finally, when converting to pil_image, we were not passing the input_data_format. For images that have 4 channels, this was breaking the processing. Since we already have the input_data_format in these functions, I added it.

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

vision models: @qubvel

HuggingFaceDocBuilderDev · 2024-11-21T14:45:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kashif · 2024-11-21T15:18:44Z

thanks @andimarafioti LGTM, I can load the dataset nicely now

qubvel

Hi @andimarafioti! Thanks for the update, just a note regarding converting to RGB

qubvel · 2024-11-21T14:57:17Z

src/transformers/models/idefics3/image_processing_idefics3.py

+        # Extra channel dimension for grayscale images
+        if input_data_format in [ChannelDimension.LAST, None]:
+            images_list = [
+                [np.expand_dims(img, axis=-1) if img.ndim == 2 else img for img in images] for images in images_list
+            ]
+        elif input_data_format == ChannelDimension.FIRST:
+            images_list = [
+                [np.expand_dims(img, axis=0) if img.ndim == 2 else img for img in images] for images in images_list
+            ]
+


There is a method that converts PIL images to RGB format (see do_convert_rgb param in other image processors). It's better to use it + extend to numpy arrays if needed.

Yes I know, the order of the conversion here is actually important as RGB conversion before image_splitting was hurting the model performance quite a bit D:

Which is why I'm not converting to RGB until later in the pipeline

Yes I know, the order of the conversion here is actually important as RGB conversion before image_splitting was hurting the model performance quite a bit D:

haha, that's interesting, do you have any clarification for such a behavior? 😄

My observation is that the order here makes the resulting images are slightly different (if you just plot the differences you see large values on edges in the images). They look the same to me, but clearly not to the model since it was trained with a different pipeline that more closely resembles this.

Ok, got it, thanks for clarifying 🤗

mfarre · 2024-11-21T20:34:52Z

@andimarafioti LGTM: I reran vlmevalkit with the current changes and I can confirm there is no regression

zucchini-nlp

Agreed that RGB better be done if do_convert_rbg by updating the helper function from utils, otherwise LGTM

andimarafioti · 2024-11-22T11:01:05Z

To clarify, here I'm not changing where or if RGB conversion happens.

qubvel · 2024-11-22T11:38:48Z

Yeah, the idea is that we don’t actually need this code because the image will be converted to RGB. However, since maintaining this order is important for quality, it seems fine to me.

* smol improvements to support more flexible usage * ruff

andimarafioti added 2 commits November 21, 2024 14:07

smol improvements to support more flexible usage

f48c228

ruff

41d8702

andimarafioti requested review from qubvel and zucchini-nlp November 21, 2024 14:25

qubvel reviewed Nov 21, 2024

View reviewed changes

zucchini-nlp approved these changes Nov 22, 2024

View reviewed changes

qubvel approved these changes Nov 22, 2024

View reviewed changes

qubvel added Vision Processing labels Nov 22, 2024

qubvel requested a review from ArthurZucker November 22, 2024 11:39

andimarafioti merged commit 861758e into huggingface:main Nov 22, 2024
11 checks passed

andimarafioti deleted the idefics3-smol-improvements branch November 22, 2024 15:34

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

smol improvements to support more flexible usage (huggingface#34857)

beb4a56

* smol improvements to support more flexible usage * ruff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smol improvements to support more flexible usage #34857

smol improvements to support more flexible usage #34857

andimarafioti commented Nov 21, 2024

HuggingFaceDocBuilderDev commented Nov 21, 2024

kashif commented Nov 21, 2024

qubvel left a comment

qubvel Nov 21, 2024

andimarafioti Nov 21, 2024

andimarafioti Nov 21, 2024

qubvel Nov 21, 2024

andimarafioti Nov 22, 2024

qubvel Nov 22, 2024

mfarre commented Nov 21, 2024

zucchini-nlp left a comment

andimarafioti commented Nov 22, 2024

qubvel commented Nov 22, 2024

smol improvements to support more flexible usage #34857

smol improvements to support more flexible usage #34857

Conversation

andimarafioti commented Nov 21, 2024

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Nov 21, 2024

kashif commented Nov 21, 2024

qubvel left a comment

Choose a reason for hiding this comment

qubvel Nov 21, 2024

Choose a reason for hiding this comment

andimarafioti Nov 21, 2024

Choose a reason for hiding this comment

andimarafioti Nov 21, 2024

Choose a reason for hiding this comment

qubvel Nov 21, 2024

Choose a reason for hiding this comment

andimarafioti Nov 22, 2024

Choose a reason for hiding this comment

qubvel Nov 22, 2024

Choose a reason for hiding this comment

mfarre commented Nov 21, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

andimarafioti commented Nov 22, 2024

qubvel commented Nov 22, 2024