Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
mmcv_custom		mmcv_custom
mmdet_custom		mmdet_custom
ops		ops
README.md		README.md
dist_test.sh		dist_test.sh
dist_train.sh		dist_train.sh
test.py		test.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh
visualization.py		visualization.py

README.md

Applying ViT-CoMer to Object Detection

Our detection code is developed on top of MMDetection v2.22.0.

Usage

Install MMDetection v2.22.0.

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.2 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm==0.4.12
pip install mmdet==2.22.0
pip install instaboostfast # for htc++
cd ops & sh make.sh # compile deformable attention

Pretraining Sources

Name	Type	Year	Data	Repo	Paper
DeiT	Supervised	2021	ImageNet-1K	repo	paper
AugReg	Supervised	2021	ImageNet-22K	repo	paper
BEiT	MIM	2021	ImageNet-22K	repo	paper
MAE	MIM	2021	ImageNet-1K	repo	paper
Uni-Perceiver	Supervised	2022	Multi-Modal	repo	paper
BEiTv2	MIM	2022	ImageNet-22K	repo	paper
DINOv2	Self-Supervised	2023	LVD-142M	repo	paper

Main Results and Models

Mask R-CNN + DINOv2

Method	Backbone	Pretrain	Lr schd	box AP	mask AP	Config	Ckpt	Log
Mask R-CNN	ViT-S	DeiT-S	3×	44.0	39.9	config	-	-
Mask R-CNN	ViT-CoMer-S	DINOv2-S	1×	48.6	42.9	config	ckpt	log
Mask R-CNN	ViT-CoMer-S	DINOv2-S	3×	52.1	45.8	config	ckpt	log
Mask R-CNN	ViT-B	DeiT-B	3×	45.8	41.3	config	-	-
Mask R-CNN	ViT-CoMer-B	DINOv2-B	1×	52.0	45.5	config	ckpt	log
Mask R-CNN	ViT-CoMer-B	DINOv2-B	3×	54.2	47.6	config	ckpt	log

ViT-CoMer + Co-DETR

We combined our ViT-CoMer with the state-of-the-art detection algorithm Co-DETR and achieved excellent results 64.3 AP. In order to help everyone conduct research on this base, we will gradually open up our training configurations and model weights. The specific implementation details, please refer to here ViT-CoMer+Co-DETR.

Method	Backbone	Pretrain	Epoch	box AP	mask AP	Config	Ckpt	Log
Co-DETR	ViT-CoMer-L	Beit2*	16e	64.3	-	config	-	-
Co-DETR	ViT-CoMer-L	Beit2	16e	62.1	-	config	-	-
Co-DINO	Swin-L	ImageNet-22K	36e	60.0	-	config	model	-

Evaluation

To evaluate Mask-RCNN + ViT-CoMer-B on COCO val2017 on a single node with 8 gpus run:

bash test.sh

Training

To train Mask-RCNN + ViT-CoMer-B on COCO train2017 on a single node with 8 gpus for 36 epochs run:

bash train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detection

detection

README.md

Applying ViT-CoMer to Object Detection

Usage

Pretraining Sources

Main Results and Models

Evaluation

Training

Files

detection

Directory actions

More options

Directory actions

More options

Latest commit

History

detection

Folders and files

parent directory

README.md

Applying ViT-CoMer to Object Detection

Usage

Pretraining Sources

Main Results and Models

Evaluation

Training