Merge pull request #77 from FilipKolodziejczyk/master

MI2DataLab · Oct 22, 2024 · 017a322 · 017a322
2 parents 1cd80c8 + ecd4a01
commit 017a322
Show file tree

Hide file tree

Showing 3 changed files with 10 additions and 1 deletion.
diff --git a/...Supervised_Vision_Transformer/Positional_Label_for_Self-Supervised_Vision_Transformer.pdf b/...Supervised_Vision_Transformer/Positional_Label_for_Self-Supervised_Vision_Transformer.pdf
diff --git a/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/README.md b/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer/README.md
@@ -0,0 +1,9 @@
+# Positional Label for Self-Supervised Vision Transformer
+
+## Abstract
+
+Self-attention, a central element of ViT architecture, is permutation-invariant. Hence, it does not capture the spatial arrangement of input by design. Thus, valuable information is lost, especially crucial in the case of computer vision tasks. To deal with that, a common approach is to add the positional information to the input embeddings (element-wise) or modify attention layers to account for that (attention score extended by the relative distance between the query and key). The authors of Positional Label for Self-Supervised Vision Transformer propose an alternative approach that does not explicitly add any positional information. Instead, the training process is extended by auxiliary task - image patches position classification. As a result, the positional information is somewhat implicitly added to the patches themselves. Both relative and absolute variants are proposed. They are plug-and-play with vanilla ViTs. The authors ensure that this solution increases the ViT performance. Moreover, this method can be used in self-supervised training, which enhances the training process.
+
+## Source paper
+
+[Positional Label for Self-Supervised Vision Transformer](https://dl.acm.org/doi/10.1609/aaai.v37i3.25461)
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@ Join us at https://meet.drwhy.ai.
 
 * 07.10 - guest lecture by nadkom. dr Paweł Olber
 * 14.10 - Do Not Explain Vision Models without Context - Paulina Tomaszewska
-* 21.10 - Positional Label for Self-Supervised Vision Transformer - Filip Kołodziejczyk
+* 21.10 - [Positional Label for Self-Supervised Vision Transformer](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer) - Filip Kołodziejczyk
 * 28.10 - ADC: Adversarial attacks against object Detection that evade Context consistency checks - Hubert Baniecki
 * 04.11 - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training - Bartosz Kochański
 * 14.11 - …