Vision Transformer (ViT)

This repository is an unofficial implementation of Vision Transformer proposed in here.

Requirements

To run the training code, run the following code

python -m src.ViT.train

To counter overfitting, L2 penalty is applied with Adam optimizer with decay value of 0.003, also dropout with a probability of 0.3 is applied.

Dataset	Accuracy	#Epochs
MNIST	95%	30

Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
src/ViT		src/ViT
tests		tests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
noxfile.py		noxfile.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt