This repository is an unofficial implementation of Vision Transformer proposed in here.
- torch
- torchvision
- tqdm
To run the training code, run the following code
python -m src.ViT.train
To counter overfitting, L2 penalty is applied with Adam optimizer with decay value of 0.003, also dropout with a probability of 0.3 is applied.
Dataset | Accuracy | #Epochs |
---|---|---|
MNIST | 95% | 30 |
- Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).