-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
285 changed files
with
52,687 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,5 +3,3 @@ local* | |
outputs* | ||
__pycache__ | ||
.pyc | ||
|
||
object_detection/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# COCO Object detection | ||
|
||
## How to use | ||
|
||
The environment for object detetction has been included in [../environment.yaml](../environment.yaml). Typically, You do not need to take care of it if you create | ||
the environment as specified in [../INSTALL.md](../INSTALL.md). In case there are problems with mmcv or mmdetection, you may uninstall the package and then reinstall it mannually, e.g. | ||
|
||
```bash | ||
pip uninstall mmcv | ||
pip install --no-cache-dir mmcv==1.7.0 | ||
``` | ||
|
||
* STEP 0: prepare data | ||
|
||
```bash | ||
$ mkdir data && ln -s /your/path/to/coco data/coco # prepare data | ||
``` | ||
|
||
* STEP 1: run experiments | ||
|
||
```bash | ||
$ vim slurm_train.sh # change config file, slurm partition, etc. | ||
$ bash slurm_train.sh | ||
``` | ||
|
||
See [`slurm_train.sh`](./slurm_train.sh) for details. | ||
|
||
|
||
## Results | ||
|
||
| name | Pretrained Model | Method | Lr Schd | mAP_box | mAP_mask | log | mAP_box<sup>*</sup> | mAP_mask<sup>*</sup> | tensorboard log<sup>*</sup> | config | | ||
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | ||
| BiFormer-S | IN1k | MaskRCNN | 1x | 47.8 | 43.2 | [log](https://1drv.ms/u/s!AkBbczdRlZvChgmyNozEQrrsfOdG?e=aOCj2A) | 48.1 | 43.6 | [tensorboard.dev](https://tensorboard.dev/experiment/EvZZMPPRTA29oL5m5olPNw/#scalars&tagFilter=mAP&_smoothingWeight=0) | [config](./configs/coco/maskrcnn.1x.biformer_small.py) | | ||
| BiFormer-B | IN1k | MaskRCNN | 1x | 48.6 | 43.7 | [log](https://1drv.ms/u/s!AkBbczdRlZvChhF-itieos4fg28D?e=Gor6oV) | - | - | - | [config](./configs/coco/maskrcnn.1x.biformer_base.py) | | ||
| BiFormer-S | IN1k | RetinaNet | 1x | 45.9 | - | [log](https://1drv.ms/u/s!AkBbczdRlZvChhKipB3XMN4_nIvO?e=TYZzFc) | 47.3 | - | [tensorboard.dev](https://tensorboard.dev/experiment/0wwQtBNFRp2VBwQeFpZy0Q/#scalars&tagFilter=mAP&_smoothingWeight=0) | [config](./configs/coco/retinanet.1x.biformer_small.py) | | ||
| BiFormer-B | IN1k | RetinaNet | 1x | 47.1 | - | [log](https://1drv.ms/u/s!AkBbczdRlZvChg-8GDypSY9leBsm?e=FyJQm1) |- | - | - | [config](./configs/coco/retinanet.1x.biformer_base.py) | | ||
|
||
<font size=1>* : reproduced right before code release.</font> | ||
|
||
**NOTE**: This repository produces significantly better performance than the paper reports, **possibly** due to | ||
|
||
1. We fixed a ["bug"](./models_mm/biformer_mm.py) of extra normalization layers. | ||
2. We used a different version of mmcv and mmdetetcion. | ||
3. We used native AMP provided by torch instead of [Nvidia apex](https://github.com/NVIDIA/apex). | ||
|
||
We do not know which factors actually work though. | ||
|
||
## Acknowledgment | ||
|
||
This code is built using [mmdetection](https://github.com/open-mmlab/mmdetection), [timm](https://github.com/rwightman/pytorch-image-models) libraries, and [UniFormer](https://github.com/Sense-X/UniFormer) repository. |
48 changes: 48 additions & 0 deletions
48
object_detection/configs/_base_/datasets/coco_detection.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
dataset_type = 'CocoDataset' | ||
data_root = 'data/coco/' | ||
img_norm_cfg = dict( | ||
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadAnnotations', with_bbox=True), | ||
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), | ||
] | ||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict( | ||
type='MultiScaleFlipAug', | ||
img_scale=(1333, 800), | ||
flip=False, | ||
transforms=[ | ||
dict(type='Resize', keep_ratio=True), | ||
dict(type='RandomFlip'), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='ImageToTensor', keys=['img']), | ||
dict(type='Collect', keys=['img']), | ||
]) | ||
] | ||
data = dict( | ||
samples_per_gpu=2, | ||
workers_per_gpu=2, | ||
train=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_train2017.json', | ||
img_prefix=data_root + 'train2017/', | ||
pipeline=train_pipeline), | ||
val=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline)) | ||
evaluation = dict(interval=1, metric='bbox') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
dataset_type = 'CocoDataset' | ||
data_root = 'data/coco/' | ||
img_norm_cfg = dict( | ||
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) | ||
train_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='LoadAnnotations', with_bbox=True, with_mask=True), | ||
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), | ||
dict(type='RandomFlip', flip_ratio=0.5), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='DefaultFormatBundle'), | ||
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), | ||
] | ||
test_pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict( | ||
type='MultiScaleFlipAug', | ||
img_scale=(1333, 800), | ||
flip=False, | ||
transforms=[ | ||
dict(type='Resize', keep_ratio=True), | ||
dict(type='RandomFlip'), | ||
dict(type='Normalize', **img_norm_cfg), | ||
dict(type='Pad', size_divisor=32), | ||
dict(type='ImageToTensor', keys=['img']), | ||
dict(type='Collect', keys=['img']), | ||
]) | ||
] | ||
data = dict( | ||
samples_per_gpu=2, | ||
workers_per_gpu=2, | ||
train=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_train2017.json', | ||
img_prefix=data_root + 'train2017/', | ||
pipeline=train_pipeline), | ||
val=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline), | ||
test=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
pipeline=test_pipeline)) | ||
evaluation = dict(metric=['bbox', 'segm']) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
checkpoint_config = dict(interval=1) | ||
# yapf:disable | ||
log_config = dict( | ||
interval=50, | ||
hooks=[ | ||
dict(type='TextLoggerHook'), | ||
dict(type='TensorboardLoggerHook') | ||
]) | ||
# yapf:enable | ||
custom_hooks = [dict(type='NumClassCheckHook')] | ||
|
||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
120 changes: 120 additions & 0 deletions
120
object_detection/configs/_base_/models/mask_rcnn_r50_fpn.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# model settings | ||
model = dict( | ||
type='MaskRCNN', | ||
pretrained='torchvision://resnet50', | ||
backbone=dict( | ||
type='ResNet', | ||
depth=50, | ||
num_stages=4, | ||
out_indices=(0, 1, 2, 3), | ||
frozen_stages=1, | ||
norm_cfg=dict(type='BN', requires_grad=True), | ||
norm_eval=True, | ||
style='pytorch'), | ||
neck=dict( | ||
type='FPN', | ||
in_channels=[256, 512, 1024, 2048], | ||
out_channels=256, | ||
num_outs=5), | ||
rpn_head=dict( | ||
type='RPNHead', | ||
in_channels=256, | ||
feat_channels=256, | ||
anchor_generator=dict( | ||
type='AnchorGenerator', | ||
scales=[8], | ||
ratios=[0.5, 1.0, 2.0], | ||
strides=[4, 8, 16, 32, 64]), | ||
bbox_coder=dict( | ||
type='DeltaXYWHBBoxCoder', | ||
target_means=[.0, .0, .0, .0], | ||
target_stds=[1.0, 1.0, 1.0, 1.0]), | ||
loss_cls=dict( | ||
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), | ||
loss_bbox=dict(type='L1Loss', loss_weight=1.0)), | ||
roi_head=dict( | ||
type='StandardRoIHead', | ||
bbox_roi_extractor=dict( | ||
type='SingleRoIExtractor', | ||
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), | ||
out_channels=256, | ||
featmap_strides=[4, 8, 16, 32]), | ||
bbox_head=dict( | ||
type='Shared2FCBBoxHead', | ||
in_channels=256, | ||
fc_out_channels=1024, | ||
roi_feat_size=7, | ||
num_classes=80, | ||
bbox_coder=dict( | ||
type='DeltaXYWHBBoxCoder', | ||
target_means=[0., 0., 0., 0.], | ||
target_stds=[0.1, 0.1, 0.2, 0.2]), | ||
reg_class_agnostic=False, | ||
loss_cls=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), | ||
loss_bbox=dict(type='L1Loss', loss_weight=1.0)), | ||
mask_roi_extractor=dict( | ||
type='SingleRoIExtractor', | ||
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), | ||
out_channels=256, | ||
featmap_strides=[4, 8, 16, 32]), | ||
mask_head=dict( | ||
type='FCNMaskHead', | ||
num_convs=4, | ||
in_channels=256, | ||
conv_out_channels=256, | ||
num_classes=80, | ||
loss_mask=dict( | ||
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), | ||
# model training and testing settings | ||
train_cfg=dict( | ||
rpn=dict( | ||
assigner=dict( | ||
type='MaxIoUAssigner', | ||
pos_iou_thr=0.7, | ||
neg_iou_thr=0.3, | ||
min_pos_iou=0.3, | ||
match_low_quality=True, | ||
ignore_iof_thr=-1), | ||
sampler=dict( | ||
type='RandomSampler', | ||
num=256, | ||
pos_fraction=0.5, | ||
neg_pos_ub=-1, | ||
add_gt_as_proposals=False), | ||
allowed_border=-1, | ||
pos_weight=-1, | ||
debug=False), | ||
rpn_proposal=dict( | ||
nms_pre=2000, | ||
max_per_img=1000, | ||
nms=dict(type='nms', iou_threshold=0.7), | ||
min_bbox_size=0), | ||
rcnn=dict( | ||
assigner=dict( | ||
type='MaxIoUAssigner', | ||
pos_iou_thr=0.5, | ||
neg_iou_thr=0.5, | ||
min_pos_iou=0.5, | ||
match_low_quality=True, | ||
ignore_iof_thr=-1), | ||
sampler=dict( | ||
type='RandomSampler', | ||
num=512, | ||
pos_fraction=0.25, | ||
neg_pos_ub=-1, | ||
add_gt_as_proposals=True), | ||
mask_size=28, | ||
pos_weight=-1, | ||
debug=False)), | ||
test_cfg=dict( | ||
rpn=dict( | ||
nms_pre=1000, | ||
max_per_img=1000, | ||
nms=dict(type='nms', iou_threshold=0.7), | ||
min_bbox_size=0), | ||
rcnn=dict( | ||
score_thr=0.05, | ||
nms=dict(type='nms', iou_threshold=0.5), | ||
max_per_img=100, | ||
mask_thr_binary=0.5))) |
Oops, something went wrong.