Move from PIL to torchvision.io.decode_image #2303

ebsmothers · 2025-01-28T00:28:47Z

Instead of using PIL.open, @NicolasHug has pointed out that we can move to torchvision.io.decode_image to speed up image processing by doing everything on pure tensors (see also. This would also allow us to drop our explicit PIL requirement). This should entail:

changing load_image to use torchvision.io.decode_image
updating CLIPImageTransform to accept tensors instead of PIL.Image (in the short term we can keep PIL.Image support for backwards compatibility)

The text was updated successfully, but these errors were encountered:

Ankur-singh · 2025-01-28T21:02:52Z

@ebsmothers Can you please assign this to me?

Thank you 😁

ebsmothers · 2025-01-28T22:37:21Z

@Ankur-singh just assigned it to you, thanks for taking it on!

NicolasHug · 2025-01-29T09:00:25Z

Feel free to ping me for help / reviews / best practices!

RdoubleA · 2025-02-08T16:12:55Z

If we do this we should update all the documentation on Messages that image content is expected to be a tensor instead of a PIL image @ebsmothers @Ankur-singh

Ankur-singh · 2025-02-09T22:55:02Z

@RdoubleA Just torch.Tensor or Union[torch.Tensor, PIL.Image]?

Vision transformer as of now supports both.

pbontrager · 2025-02-10T18:07:12Z

PIL was used as the only input type intentionally to enforce a consistent image API. A PIL image contains all the necessary meta data about an image to ensure the right transforms are being applied to an image (e.g. RGB channels vs BGR) whereas tensors don't have this information and we have to put the responsibility on the user to input the data in the expected format. If we're able to get that info from torchvision though that changes things.

Ankur-singh · 2025-02-10T18:28:59Z

Keeping backward compatibility in mind, should be support PIL images as well? We can add a deprecation warning and drop PIL image support in the next release.

OR we can drop the support right away and update all the docs to use torch.Tensor.

NicolasHug · 2025-02-11T09:30:51Z

Torchvision doesn't carry the channel ordering (RGB vs BRG), but it's not a torchvision thing, it's a pytorch thing: all image tensors in pytorch are expected to be NCHW RGB images. That's the one and only standard, so there never was a need to explicitly store that. I think it's a widely understood standard, e.g. every model (on HF or other hubs) expects RGB tensors.

I'm happy to dig more with you to better understand your concerns.

Ankur-singh · 2025-02-11T17:59:55Z

Thanks @NicolasHug.

I was talking about ClipImageTransform. It accepts both PIL image and torch tensor. With the new PR (#2366), load_image returns torch tensor so all built-in multimodal datasets make use of torch tensor all the way.

I see a few examples in the docs that make use of PIL (Interleaving images in text and Multimodal Transforms) when creating messages. I was talking about these docs. Should we drop the support right away and update all the docs? or add deprecation warning and drop support in next release?

pbontrager · 2025-02-12T13:10:57Z

I guess the key issue is if it's reasonable for load_image to assume the image provided is in RGB format. Yes Pytorch expects users to convert images to rgb but that's when they're setting up all the code themselves and are responsible for the dataset. torchtune is going a bit further and saying "just point to any HF dataset" and this recipe will work. Also load_image is used in the inference recipe and allows the user to point to any random url.

The PIL API is nice in that it's very consistent, but an alternative approach would be to use torchvision in load_image and just have load_image verify that the image is rgb (I think that's the only format information we need to verify?). A user could bypass this safety check by not using load_image but at least our default out of the box solution would be safe.

@NicolasHug do you know if that verification could be done cheaply without PIL?

pbontrager · 2025-02-12T13:36:34Z

@Ankur-singh I talked to Nicolas and he informed me that torchvision decode guarantees it'll always return rgb ordering during the decoding process. So I have no more issues with this change.

I think it would be fairly safe to drop PIL support right away since it's mostly used as an intermediary between load_image and the model transform, but it would be better to deprecate it over one release.

NicolasHug · 2025-02-12T13:39:55Z

One thought for the transforms (not the decoders): If you think it's worth it in terms of UX, you might still be able to transparently support both tensors and PIL images, without necessarily having a direct dependency on PIL. As long as you just use the torchvision transforms (which support both PIL and tensors), you should be fine.

ebsmothers added best practice Things we should be doing but aren't community help wanted We would love the community's help completing this issue labels Jan 28, 2025

ebsmothers assigned Ankur-singh Jan 28, 2025

Ankur-singh mentioned this issue Feb 8, 2025

Refactor load_image to return torch.Tensor instead of PIL.Image #2366

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move from PIL to torchvision.io.decode_image #2303

Move from PIL to torchvision.io.decode_image #2303

ebsmothers commented Jan 28, 2025

Ankur-singh commented Jan 28, 2025

ebsmothers commented Jan 28, 2025

NicolasHug commented Jan 29, 2025

RdoubleA commented Feb 8, 2025

Ankur-singh commented Feb 9, 2025

pbontrager commented Feb 10, 2025

Ankur-singh commented Feb 10, 2025

NicolasHug commented Feb 11, 2025

Ankur-singh commented Feb 11, 2025

pbontrager commented Feb 12, 2025

pbontrager commented Feb 12, 2025

NicolasHug commented Feb 12, 2025 •

edited

Loading

Move from PIL to torchvision.io.decode_image #2303

Move from PIL to torchvision.io.decode_image #2303

Comments

ebsmothers commented Jan 28, 2025

Ankur-singh commented Jan 28, 2025

ebsmothers commented Jan 28, 2025

NicolasHug commented Jan 29, 2025

RdoubleA commented Feb 8, 2025

Ankur-singh commented Feb 9, 2025

pbontrager commented Feb 10, 2025

Ankur-singh commented Feb 10, 2025

NicolasHug commented Feb 11, 2025

Ankur-singh commented Feb 11, 2025

pbontrager commented Feb 12, 2025

pbontrager commented Feb 12, 2025

NicolasHug commented Feb 12, 2025 • edited Loading

NicolasHug commented Feb 12, 2025 •

edited

Loading