diff --git a/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/Positional_Label_for_Self-Supervised_Vision_Transformer.pdf b/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/Positional_Label_for_Self-Supervised_Vision_Transformer.pdf new file mode 100644 index 0000000..da05809 Binary files /dev/null and b/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/Positional_Label_for_Self-Supervised_Vision_Transformer.pdf differ diff --git a/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/README.md b/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/README.md new file mode 100644 index 0000000..9840c81 --- /dev/null +++ b/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/README.md @@ -0,0 +1,9 @@ +# Positional Label for Self-Supervised Vision Transformer + +## Abstract + +Self-attention, a central element of ViT architecture, is permutation-invariant. Hence, it does not capture the spatial arrangement of input by design. Thus, valuable information is lost, especially crucial in the case of computer vision tasks. To deal with that, a common approach is to add the positional information to the input embeddings (element-wise) or modify attention layers to account for that (attention score extended by the relative distance between the query and key). The authors of Positional Label for Self-Supervised Vision Transformer propose an alternative approach that does not explicitly add any positional information. Instead, the training process is extended by auxiliary task - image patches position classification. As a result, the positional information is somewhat implicitly added to the patches themselves. Both relative and absolute variants are proposed. They are plug-and-play with vanilla ViTs. The authors ensure that this solution increases the ViT performance. Moreover, this method can be used in self-supervised training, which enhances the training process. + +## Source paper + +[Positional Label for Self-Supervised Vision Transformer](https://dl.acm.org/doi/10.1609/aaai.v37i3.25461) \ No newline at end of file diff --git a/README.md b/README.md index bf0fad5..e9ead95 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Join us at https://meet.drwhy.ai. * 07.10 - guest lecture by nadkom. dr Paweł Olber * 14.10 - Do Not Explain Vision Models without Context - Paulina Tomaszewska -* 21.10 - Positional Label for Self-Supervised Vision Transformer - Filip Kołodziejczyk +* 21.10 - [Positional Label for Self-Supervised Vision Transformer](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer) - Filip Kołodziejczyk * 28.10 - ADC: Adversarial attacks against object Detection that evade Context consistency checks - Hubert Baniecki * 04.11 - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training - Bartosz Kochański * 14.11 - …