Our detection code is developed on top of MMDetection v2.22.0.
Install MMDetection v2.22.0.
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0
pip install instaboostfast # for htc++
cd ops & sh make.sh # compile deformable attention
Name | Type | Year | Data | Repo | Paper |
---|---|---|---|---|---|
DeiT | Supervised | 2021 | ImageNet-1K | repo | paper |
AugReg | Supervised | 2021 | ImageNet-22K | repo | paper |
BEiT | MIM | 2021 | ImageNet-22K | repo | paper |
MAE | MIM | 2021 | ImageNet-1K | repo | paper |
Uni-Perceiver | Supervised | 2022 | Multi-Modal | repo | paper |
BEiTv2 | MIM | 2022 | ImageNet-22K | repo | paper |
DINOv2 | Self-Supervised | 2023 | LVD-142M | repo | paper |
Mask R-CNN + DINOv2
Method | Backbone | Pretrain | Lr schd | box AP | mask AP | Config | Ckpt | Log |
---|---|---|---|---|---|---|---|---|
Mask R-CNN | ViT-S | DeiT-S | 3× | 44.0 | 39.9 | config | - | - |
Mask R-CNN | ViT-CoMer-S | DINOv2-S | 1× | 48.6 | 42.9 | config | ckpt | log |
Mask R-CNN | ViT-CoMer-S | DINOv2-S | 3× | 52.1 | 45.8 | config | ckpt | log |
Mask R-CNN | ViT-B | DeiT-B | 3× | 45.8 | 41.3 | config | - | - |
Mask R-CNN | ViT-CoMer-B | DINOv2-B | 1× | 52.0 | 45.5 | config | ckpt | log |
Mask R-CNN | ViT-CoMer-B | DINOv2-B | 3× | 54.2 | 47.6 | config | ckpt | log |
ViT-CoMer + Co-DETR
We combined our ViT-CoMer with the state-of-the-art detection algorithm Co-DETR and achieved excellent results 64.3 AP
. In order to help everyone conduct research on this base, we will gradually open up our training configurations and model weights. The specific implementation details, please refer to here ViT-CoMer+Co-DETR.
Method | Backbone | Pretrain | Epoch | box AP | mask AP | Config | Ckpt | Log |
---|---|---|---|---|---|---|---|---|
Co-DETR | ViT-CoMer-L | Beit2* | 16e | 64.3 | - | config | - | - |
Co-DETR | ViT-CoMer-L | Beit2 | 16e | 62.1 | - | config | - | - |
Co-DINO | Swin-L | ImageNet-22K | 36e | 60.0 | - | config | model | - |
To evaluate Mask-RCNN + ViT-CoMer-B on COCO val2017 on a single node with 8 gpus run:
bash test.sh
To train Mask-RCNN + ViT-CoMer-B on COCO train2017 on a single node with 8 gpus for 36 epochs run:
bash train.sh