You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, open this topic because I am a bit lost in which are the best features to extract for comparing images (looking for similar images, independet of view point)
def load_dino_vit_model(weights_path):
# Load a pre-trained DINO ViT model
# Specify the appropriate model name and path as needed
model_name = 'vit_small_patch16_224' # Example model name, adjust based on actual use
model = timm.create_model(model_name, pretrained=False, num_classes=0) # num_classes=0 for feature extraction
checkpoint = torch.load(weights_path, map_location='cpu')
# Extract the 'teacher' state dictionary and remove the 'backbone.' prefix from each key
state_dict = checkpoint['teacher']
adapted_state_dict = {key.replace('backbone.', ''): value for key, value in state_dict.items()}
model.load_state_dict(adapted_state_dict, strict=False)
#model = torch.hub.load('facebookresearch/dino:main', 'dino_vitb8')
model.eval() # Set the model to evaluation mode
if torch.cuda.is_available():
model.cuda()
return model
I am loading this model and then just do
output = model(image)
this returns a 384 (or 768) dimensial feature. Is this feature the class tokens activations? or it comes from other place?
I think if this is the case it would not be ideal as in contains positional informations, which is not the best for comparing images from different viewpoints.
Also I see that from the teacher model I am not using the mlp head that is used for training and it outputs 60k+ dim for training and comparing to the student branch.
So, If I would like to have an image feature (with pseudo-semantinc info, not positional) in the order of 2..3k dimansional, which would be the best place to get it from
Thanks
The text was updated successfully, but these errors were encountered:
Hi, open this topic because I am a bit lost in which are the best features to extract for comparing images (looking for similar images, independet of view point)
I am loading this model and then just do
output = model(image)
this returns a 384 (or 768) dimensial feature. Is this feature the class tokens activations? or it comes from other place?
I think if this is the case it would not be ideal as in contains positional informations, which is not the best for comparing images from different viewpoints.
Also I see that from the teacher model I am not using the mlp head that is used for training and it outputs 60k+ dim for training and comparing to the student branch.
So, If I would like to have an image feature (with pseudo-semantinc info, not positional) in the order of 2..3k dimansional, which would be the best place to get it from
Thanks
The text was updated successfully, but these errors were encountered: