-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move from PIL to torchvision.io.decode_image #2303
Comments
@ebsmothers Can you please assign this to me? Thank you 😁 |
@Ankur-singh just assigned it to you, thanks for taking it on! |
Feel free to ping me for help / reviews / best practices! |
If we do this we should update all the documentation on Messages that image content is expected to be a tensor instead of a PIL image @ebsmothers @Ankur-singh |
@RdoubleA Just Vision transformer as of now supports both. |
PIL was used as the only input type intentionally to enforce a consistent image API. A PIL image contains all the necessary meta data about an image to ensure the right transforms are being applied to an image (e.g. RGB channels vs BGR) whereas tensors don't have this information and we have to put the responsibility on the user to input the data in the expected format. If we're able to get that info from torchvision though that changes things. |
Keeping backward compatibility in mind, should be support PIL images as well? We can add a deprecation warning and drop PIL image support in the next release. OR we can drop the support right away and update all the docs to use |
Torchvision doesn't carry the channel ordering (RGB vs BRG), but it's not a torchvision thing, it's a pytorch thing: all image tensors in pytorch are expected to be NCHW RGB images. That's the one and only standard, so there never was a need to explicitly store that. I think it's a widely understood standard, e.g. every model (on HF or other hubs) expects RGB tensors. I'm happy to dig more with you to better understand your concerns. |
Thanks @NicolasHug. I was talking about I see a few examples in the docs that make use of PIL (Interleaving images in text and Multimodal Transforms) when creating messages. I was talking about these docs. Should we drop the support right away and update all the docs? or add deprecation warning and drop support in next release? |
I guess the key issue is if it's reasonable for load_image to assume the image provided is in RGB format. Yes Pytorch expects users to convert images to rgb but that's when they're setting up all the code themselves and are responsible for the dataset. torchtune is going a bit further and saying "just point to any HF dataset" and this recipe will work. Also load_image is used in the inference recipe and allows the user to point to any random url. The PIL API is nice in that it's very consistent, but an alternative approach would be to use torchvision in load_image and just have load_image verify that the image is rgb (I think that's the only format information we need to verify?). A user could bypass this safety check by not using load_image but at least our default out of the box solution would be safe. @NicolasHug do you know if that verification could be done cheaply without PIL? |
@Ankur-singh I talked to Nicolas and he informed me that torchvision decode guarantees it'll always return rgb ordering during the decoding process. So I have no more issues with this change. I think it would be fairly safe to drop PIL support right away since it's mostly used as an intermediary between load_image and the model transform, but it would be better to deprecate it over one release. |
One thought for the transforms (not the decoders): If you think it's worth it in terms of UX, you might still be able to transparently support both tensors and PIL images, without necessarily having a direct dependency on PIL. As long as you just use the torchvision transforms (which support both PIL and tensors), you should be fine. |
Instead of using
PIL.open
, @NicolasHug has pointed out that we can move totorchvision.io.decode_image
to speed up image processing by doing everything on pure tensors (see also. This would also allow us to drop our explicit PIL requirement). This should entail:The text was updated successfully, but these errors were encountered: