You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to see the result( a image)after ShuntendTransformer,but the type of output is tensor,and the channel is not 3,I don't know how to
acquire the image after ShuntendTransformer?
I will appreciate if you can help me,thank you very much.
The text was updated successfully, but these errors were encountered:
I find that in DINO, the default backbone is ViT or DeiT, which uses the CLS token to represent the whole image-level information. So the visualization result is corresponding to the CLS token.
But Shuntend does not use CLS token, so It may be difficult to adopt the attention map to reflect the RoI of the whole image, and Grad-CAM may be the alternative.
I want to see the result( a image)after ShuntendTransformer,but the type of output is tensor,and the channel is not 3,I don't know how to
acquire the image after ShuntendTransformer?
I will appreciate if you can help me,thank you very much.
The text was updated successfully, but these errors were encountered: