Our classification code is developed on top of pytorch-image-models and deit.
For details see Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
If you use this code for a paper please cite:
PVTv1
@misc{wang2021pyramid,
title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2102.12122},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
PVTv2
@misc{wang2021pvtv2,
title={PVTv2: Improved Baselines with Pyramid Vision Transformer},
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
year={2021},
eprint={2106.13797},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- PVT + ImageNet-22K pre-training.
First, clone the repository locally:
git clone https://github.com/whai362/PVT.git
Then, install PyTorch 1.6.0+ and torchvision 0.7.0+ and pytorch-image-models 0.3.2:
conda install -c pytorch pytorch torchvision
pip install timm==0.3.2
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder
, and the training and validation data is expected to be in the train/
folder and val
folder respectively:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
- PVTv2 on ImageNet-1K
Method | Size | Acc@1 | #Params (M) | Config | Download |
---|---|---|---|---|---|
PVT-V2-B0 | 224 | 70.5 | 3.7 | config | 14M [Google] [GitHub] |
PVT-V2-B1 | 224 | 78.7 | 14.0 | config | 54M [Google] [GitHub] |
PVT-V2-B2-Linear | 224 | 82.1 | 22.6 | config | 86M [GitHub] |
PVT-V2-B2 | 224 | 82.0 | 25.4 | config | 97M [Google] [GitHub] |
PVT-V2-B3 | 224 | 83.1 | 45.2 | config | 173M [Google] [GitHub] |
PVT-V2-B4 | 224 | 83.6 | 62.6 | config | 239M [Google] [GitHub] |
PVT-V2-B5 | 224 | 83.8 | 82.0 | config | 313M [Google] [GitHub] |
- PVTv1 on ImageNet-1K
Method | Size | Acc@1 | #Params (M) | Config | Download |
---|---|---|---|---|---|
PVT-Tiny | 224 | 75.1 | 13.2 | config | 51M [Google] [GitHub] |
PVT-Small | 224 | 79.8 | 24.5 | config | 93M [Google] [GitHub] |
PVT-Medium | 224 | 81.2 | 44.2 | config | 168M [Google] [GitHub] |
PVT-Large | 224 | 81.7 | 61.4 | config | 234M [Google] [GitHub] |
To evaluate a pre-trained PVT-Small on ImageNet val with a single GPU run:
sh dist_train.sh configs/pvt/pvt_small.py 1 --data-path /path/to/imagenet --resume /path/to/checkpoint_file --eval
This should give
* Acc@1 79.764 Acc@5 94.950 loss 0.885
Accuracy of the network on the 50000 test images: 79.8%
To train PVT-Small on ImageNet on a single node with 8 gpus for 300 epochs run:
sh dist_train.sh configs/pvt/pvt_small.py 8 --data-path /path/to/imagenet
python get_flops.py pvt_v2_b2
This should give
Input shape: (3, 224, 224)
Flops: 4.04 GFLOPs
Params: 25.36 M
This repository is released under the Apache 2.0 license as found in the LICENSE file.