Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

SuwonPabby · 2024-02-29T01:58:32Z

Hello, I am currently working with the SPHINX library for various tasks, and I cannot seem to find the functionality within the repository to convert images to sub-images.

I am wondering if this functionality is absent from the repository, requiring an external solution, or if I am simply overlooking it.

Thank you for your assistance.

ChrisLiu6 · 2024-02-29T06:00:00Z

LLaMA2-Accessory/accessory/model/LLM/llama_ens5.py

Lines 383 to 385 in 2fe5e0b

    
           image_224 = F.interpolate(image.half(), size=(224,224), mode="bicubic").to(image) 
        
           image_parts = [image[..., :224, :224], image[..., :224, 224:], image[..., 224:, :224], image[..., 224:, 224:]] 
        
           image: torch.Tensor = torch.cat([image_224] + image_parts, dim=0)

the ens5 and ens10 in model naming mean totally 5 or 10 views, including the global view

SuwonPabby · 2024-02-29T08:13:05Z

Thanks for your help!!!

SuwonPabby · 2024-02-29T08:30:10Z

LLaMA2-Accessory/accessory/model/LLM/llama_ens10.py

Lines 377 to 390 in 40cf02b

    
           def encode_image(self, image): 
        
               # [B, 32, 768] 
        
               self.clip.eval() 
        
               self.openclip_convnext_xxl.eval() 
        
               # images should be of size [bsz, 448, 448] 
        
               # convert them to 5 224*224 images 
        
               image_224 = F.interpolate(image.half(), size=(224,224), mode="bicubic").to(image) 
        
               image_parts = [] 
        
               for y_start in range(0, image.shape[-2], 224): 
        
                   for x_start in range(0, image.shape[-1], 224): 
        
                       image_parts.append(image[...,y_start:y_start+224,x_start:x_start+224]) 
        
               image: torch.Tensor = torch.cat([image_224] + image_parts, dim=0) 
        
               n_views_per_image = len(image_parts) + 1

But I still wonder about:

the ens10 means 10 views of images including the global view

But, the code above is the code for ens10, and it still looks like it is putting 5 views.

ChrisLiu6 · 2024-02-29T10:08:27Z

The following attribute:

LLaMA2-Accessory/accessory/model/LLM/llama_ens10.py

Line 336 in 40cf02b

self.image_size = 672

defines the size of the image. In ens10 it is 672 so there would be 9 local views. Sorry for the wrong comments.

SuwonPabby · 2024-02-29T10:14:48Z

Thank you for your kind information! It really helped me a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

SuwonPabby commented Feb 29, 2024

ChrisLiu6 commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024 •

edited

Loading

ChrisLiu6 commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024

Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

Comments

SuwonPabby commented Feb 29, 2024

ChrisLiu6 commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024 • edited Loading

ChrisLiu6 commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024

SuwonPabby commented Feb 29, 2024 •

edited

Loading