Applying PVT to Object Detection

Our detection code is developed on top of MMDetection v2.13.0.

For details see Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

If you use this code for a paper please cite:

PVTv1

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

PVTv2

@misc{wang2021pvtv2,
      title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2106.13797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Usage

Install MMDetection v2.13.0.

or

pip install mmdet==2.13.0 --user

Apex (optional):

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Data preparation

Prepare COCO according to the guidelines in MMDetection v2.13.0.

Results and models

PVTv2 on COCO

Method	Backbone	Pretrain	Lr schd	Aug	box AP	mask AP	Config	Download
RetinaNet	PVTv2-b0	ImageNet-1K	1x	No	37.2	-	config	log & model
RetinaNet	PVTv2-b1	ImageNet-1K	1x	No	41.2	-	config	log & model
RetinaNet	PVTv2-b2-li	ImageNet-1K	1x	No	43.6	-	config	log & model
RetinaNet	PVTv2-b2	ImageNet-1K	1x	No	44.6	-	config	log & model
RetinaNet	PVTv2-b3	ImageNet-1K	1x	No	45.9	-	config	log & model
RetinaNet	PVTv2-b4	ImageNet-1K	1x	No	46.1	-	config	log & model
RetinaNet	PVTv2-b5	ImageNet-1K	1x	No	46.2	-	config	log & model
Mask R-CNN	PVTv2-b0	ImageNet-1K	1x	No	38.2	36.2	config	log & model
Mask R-CNN	PVTv2-b1	ImageNet-1K	1x	No	41.8	38.8	config	log & model
Mask R-CNN	PVTv2-b2-li	ImageNet-1K	1x	No	44.1	40.5	config	log & model
Mask R-CNN	PVTv2-b2	ImageNet-1K	1x	No	45.3	41.2	config	log & model
Mask R-CNN	PVTv2-b3	ImageNet-1K	1x	No	47.0	42.5	config	log & model
Mask R-CNN	PVTv2-b4	ImageNet-1K	1x	No	47.5	42.7	config	log & model
Mask R-CNN	PVTv2-b5	ImageNet-1K	1x	No	47.4	42.5	config	log & model

Method	Backbone	Pretrain	Lr schd	Aug	box AP	mask AP	Config	Download
Cascade Mask R-CNN	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	50.9	44.0	config	log & model
Cascade Mask R-CNN	PVTv2-b2	ImageNet-1K	3x	Yes	51.1	44.4	config	log & model
ATSS	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	48.9	-	config	log & model
ATSS	PVTv2-b2	ImageNet-1K	3x	Yes	49.9	-	config	log & model
GFL	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	49.2	-	config	log & model
GFL	PVTv2-b2	ImageNet-1K	3x	Yes	50.2	-	config	log & model
Sparse R-CNN	PVTv2-b2-Linear	ImageNet-1K	3x	Yes	48.9	-	config	log & model
Sparse R-CNN	PVTv2-b2	ImageNet-1K	3x	Yes	50.1	-	config	log & model

PVTv1 on COCO

Method	Backbone	Pretrain	Lr schd	box AP	mask AP	Config	Download
RetinaNet	PVT-Tiny	ImageNet-1K	1x	36.7	-	config	log & model
RetinaNet (640x)	PVT-Small	ImageNet-1K	1x	38.7	-	config	log & model
RetinaNet (800x)	PVT-Small	ImageNet-1K	1x	40.4	-	config	log & model
RetinaNet	PVT-Medium	ImageNet-1K	1x	41.9	-	config	log & model
RetinaNet	PVT-Large	ImageNet-1K	1x	42.6	-	config	log & model
Mask R-CNN	PVT-Tiny	ImageNet-1K	1x	36.7	35.1	config	log & model
Mask R-CNN	PVT-Small	ImageNet-1K	1x	40.4	37.8	config	log & model
Mask R-CNN	PVT-Medium	ImageNet-1K	1x	42.0	39.0	config	log & model
Mask R-CNN	PVT-Large	ImageNet-1K	1x	42.9	39.5	config	log & model
DETR	PVT-Small	ImageNet-1K	50ep	34.7	-	config	log & model

Evaluation

To evaluate PVT-Small + RetinaNet (640x) on COCO val2017 on a single node with 8 gpus run:

dist_test.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox

This should give

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.387
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.593
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.408
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.212
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.416
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.544
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.545
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.545
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.545
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.329
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.583
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.721

Training

To train PVT-Small + RetinaNet (640x) on COCO train2017 on a single node with 8 gpus for 12 epochs run:

dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 8

Demo

python demo.py demo.jpg /path/to/config_file /path/to/checkpoint_file

Calculating FLOPS & Params

python get_flops.py configs/gfl_pvt_v2_b2_fpn_3x_mstrain_fp16.py

This should give

Input shape: (3, 1280, 800)
Flops: 260.65 GFLOPs
Params: 33.11 M

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Applying PVT to Object Detection

Usage

Data preparation

Results and models

Evaluation

Training

Demo

Calculating FLOPS & Params

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Applying PVT to Object Detection

Usage

Data preparation

Results and models

Evaluation

Training

Demo

Calculating FLOPS & Params

License