Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best features for image similarity #272

Open
fbliman opened this issue Mar 6, 2024 · 2 comments
Open

Best features for image similarity #272

fbliman opened this issue Mar 6, 2024 · 2 comments

Comments

@fbliman
Copy link

fbliman commented Mar 6, 2024

Hi, open this topic because I am a bit lost in which are the best features to extract for comparing images (looking for similar images, independet of view point)

def load_dino_vit_model(weights_path):
    # Load a pre-trained DINO ViT model
    # Specify the appropriate model name and path as needed
    model_name = 'vit_small_patch16_224'  # Example model name, adjust based on actual use
    model = timm.create_model(model_name, pretrained=False, num_classes=0)  # num_classes=0 for feature extraction
    checkpoint = torch.load(weights_path, map_location='cpu')
    
    # Extract the 'teacher' state dictionary and remove the 'backbone.' prefix from each key
    state_dict = checkpoint['teacher']
    adapted_state_dict = {key.replace('backbone.', ''): value for key, value in state_dict.items()}
    
    model.load_state_dict(adapted_state_dict, strict=False)

    #model = torch.hub.load('facebookresearch/dino:main', 'dino_vitb8')
    model.eval()  # Set the model to evaluation mode
    if torch.cuda.is_available():
        model.cuda()
    return model

I am loading this model and then just do

output = model(image)

this returns a 384 (or 768) dimensial feature. Is this feature the class tokens activations? or it comes from other place?

I think if this is the case it would not be ideal as in contains positional informations, which is not the best for comparing images from different viewpoints.

Also I see that from the teacher model I am not using the mlp head that is used for training and it outputs 60k+ dim for training and comparing to the student branch.

So, If I would like to have an image feature (with pseudo-semantinc info, not positional) in the order of 2..3k dimansional, which would be the best place to get it from

Thanks

@marzooq-unbxd
Copy link

found anything?

@zdaiot
Copy link

zdaiot commented Nov 22, 2024

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants