Skip to content

Commit

Permalink
Merge pull request #77 from FilipKolodziejczyk/master
Browse files Browse the repository at this point in the history
  • Loading branch information
sobieskibj authored Oct 22, 2024
2 parents 1cd80c8 + ecd4a01 commit 017a322
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 1 deletion.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Positional Label for Self-Supervised Vision Transformer

## Abstract

Self-attention, a central element of ViT architecture, is permutation-invariant. Hence, it does not capture the spatial arrangement of input by design. Thus, valuable information is lost, especially crucial in the case of computer vision tasks. To deal with that, a common approach is to add the positional information to the input embeddings (element-wise) or modify attention layers to account for that (attention score extended by the relative distance between the query and key). The authors of Positional Label for Self-Supervised Vision Transformer propose an alternative approach that does not explicitly add any positional information. Instead, the training process is extended by auxiliary task - image patches position classification. As a result, the positional information is somewhat implicitly added to the patches themselves. Both relative and absolute variants are proposed. They are plug-and-play with vanilla ViTs. The authors ensure that this solution increases the ViT performance. Moreover, this method can be used in self-supervised training, which enhances the training process.

## Source paper

[Positional Label for Self-Supervised Vision Transformer](https://dl.acm.org/doi/10.1609/aaai.v37i3.25461)
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Join us at https://meet.drwhy.ai.

* 07.10 - guest lecture by nadkom. dr Paweł Olber
* 14.10 - Do Not Explain Vision Models without Context - Paulina Tomaszewska
* 21.10 - Positional Label for Self-Supervised Vision Transformer - Filip Kołodziejczyk
* 21.10 - [Positional Label for Self-Supervised Vision Transformer](https://github.com/MI2DataLab/MI2DataLab_Seminarium/tree/master/2024/2024_10_21_Positional_Label_for_Self-Supervised_Vision_Transformer) - Filip Kołodziejczyk
* 28.10 - ADC: Adversarial attacks against object Detection that evade Context consistency checks - Hubert Baniecki
* 04.11 - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training - Bartosz Kochański
* 14.11 - …
Expand Down

0 comments on commit 017a322

Please sign in to comment.