Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

Open
SuwonPabby opened this issue Feb 29, 2024 · 5 comments

Comments

@SuwonPabby
Copy link

Hello, I am currently working with the SPHINX library for various tasks, and I cannot seem to find the functionality within the repository to convert images to sub-images.

I am wondering if this functionality is absent from the repository, requiring an external solution, or if I am simply overlooking it.

Thank you for your assistance.

@ChrisLiu6
Copy link
Collaborator

image_224 = F.interpolate(image.half(), size=(224,224), mode="bicubic").to(image)
image_parts = [image[..., :224, :224], image[..., :224, 224:], image[..., 224:, :224], image[..., 224:, 224:]]
image: torch.Tensor = torch.cat([image_224] + image_parts, dim=0)

the ens5 and ens10 in model naming mean totally 5 or 10 views, including the global view

@SuwonPabby
Copy link
Author

Thanks for your help!!!

@SuwonPabby
Copy link
Author

SuwonPabby commented Feb 29, 2024

def encode_image(self, image):
# [B, 32, 768]
self.clip.eval()
self.openclip_convnext_xxl.eval()
# images should be of size [bsz, 448, 448]
# convert them to 5 224*224 images
image_224 = F.interpolate(image.half(), size=(224,224), mode="bicubic").to(image)
image_parts = []
for y_start in range(0, image.shape[-2], 224):
for x_start in range(0, image.shape[-1], 224):
image_parts.append(image[...,y_start:y_start+224,x_start:x_start+224])
image: torch.Tensor = torch.cat([image_224] + image_parts, dim=0)
n_views_per_image = len(image_parts) + 1

But I still wonder about:

the ens10 means 10 views of images including the global view

But, the code above is the code for ens10, and it still looks like it is putting 5 views.

@ChrisLiu6
Copy link
Collaborator

The following attribute:


defines the size of the image. In ens10 it is 672 so there would be 9 local views. Sorry for the wrong comments.

@SuwonPabby
Copy link
Author

Thank you for your kind information! It really helped me a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants