Skip to content

Latest commit

 

History

History

detection

Applying ViT-CoMer to Object Detection

Our detection code is developed on top of MMDetection v2.22.0.

Usage

Install MMDetection v2.22.0.

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0
pip install instaboostfast # for htc++
cd ops & sh make.sh # compile deformable attention

Pretraining Sources

Name Type Year Data Repo Paper
DeiT Supervised 2021 ImageNet-1K repo paper
AugReg Supervised 2021 ImageNet-22K repo paper
BEiT MIM 2021 ImageNet-22K repo paper
MAE MIM 2021 ImageNet-1K repo paper
Uni-Perceiver Supervised 2022 Multi-Modal repo paper
BEiTv2 MIM 2022 ImageNet-22K repo paper
DINOv2 Self-Supervised 2023 LVD-142M repo paper

Main Results and Models

Mask R-CNN + DINOv2

Method Backbone Pretrain Lr schd box AP mask AP Config Ckpt Log
Mask R-CNN ViT-S DeiT-S 44.0 39.9 config - -
Mask R-CNN ViT-CoMer-S DINOv2-S 48.6 42.9 config ckpt log
Mask R-CNN ViT-CoMer-S DINOv2-S 52.1 45.8 config ckpt log
Mask R-CNN ViT-B DeiT-B 45.8 41.3 config - -
Mask R-CNN ViT-CoMer-B DINOv2-B 52.0 45.5 config ckpt log
Mask R-CNN ViT-CoMer-B DINOv2-B 54.2 47.6 config ckpt log

ViT-CoMer + Co-DETR

We combined our ViT-CoMer with the state-of-the-art detection algorithm Co-DETR and achieved excellent results 64.3 AP. In order to help everyone conduct research on this base, we will gradually open up our training configurations and model weights. The specific implementation details, please refer to here ViT-CoMer+Co-DETR.

Method Backbone Pretrain Epoch box AP mask AP Config Ckpt Log
Co-DETR ViT-CoMer-L Beit2* 16e 64.3 - config - -
Co-DETR ViT-CoMer-L Beit2 16e 62.1 - config - -
Co-DINO Swin-L ImageNet-22K 36e 60.0 - config model -

Evaluation

To evaluate Mask-RCNN + ViT-CoMer-B on COCO val2017 on a single node with 8 gpus run:

bash test.sh

Training

To train Mask-RCNN + ViT-CoMer-B on COCO train2017 on a single node with 8 gpus for 36 epochs run:

bash train.sh