Align Deep Features for Oriented Object Detection,
Jiaming Han*, Jian Ding*, Jie Li, Gui-Song Xia†,
arXiv preprint (arXiv:2008.09397) / TGRS (IEEE Xplore).
The repo is based on mmdetection.
Two versions are provided here: Original version and v20210104. We recommend to use v20210104 (i.e. the master branch).
The past decade has witnessed significant progress on detecting objects in aerial images that are often distributed with large scale variations and arbitrary orientations. However most of existing methods rely on heuristically defined anchors with different scales, angles and aspect ratios and usually suffer from severe misalignment between anchor boxes and axis-aligned convolutional features, which leads to the common inconsistency between the classification score and localization accuracy. To address this issue, we propose a Single-shot Alignment Network (S2A-Net) consisting of two modules: a Feature Alignment Module (FAM) and an Oriented Detection Module (ODM). The FAM can generate high-quality anchors with an Anchor Refinement Network and adaptively align the convolutional features according to the corresponding anchor boxes with a novel Alignment Convolution. The ODM first adopts active rotating filters to encode the orientation information and then produces orientation-sensitive and orientation-invariant features to alleviate the inconsistency between classification score and localization accuracy. Besides, we further explore the approach to detect objects in large-size images, which leads to a better speed-accuracy trade-off. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on two commonly used aerial objects datasets (i.e., DOTA and HRSC2016) while keeping high efficiency.
-
2021-06-03. Docker support. See install.md.
-
2021-04-29. Third-party implementation with PaddleDetection.
-
2021-04-10. Rotated IoU Loss is added that further boosts the performance.
-
2021-03-13. Our paper is available at IEEE Xplore.
-
2021-02-06. Accepted to IEEE Transactions on Geoscience and Remote Sensing (TGRS).
-
2021-01-01. Big changes! Following mmdetection v2, we made a lot of changes to our code. Our original code contains many unnecessary functions and inappropriate modifications. So we modified related codes, e.g, dataset preprocessing and loading, unified function names, iou calculator between OBBs, and evaluation. Besides, we also implement a Cascade S2A-Net. Compared with previous versions, the updated version is more straightforward and easy to understand.
- Original implementation on DOTA
Model | Backbone | MS | Rotate | Lr schd | Inf time (fps) | box AP (ori./now) | Download |
---|---|---|---|---|---|---|---|
RetinaNet | R-50-FPN | - | - | 1x | 16.0 | 68.05/68.40 | model |
S2A-Net | R-50-FPN | - | - | 1x | 16.0 | 74.12/73.99 | model |
S2A-Net | R-50-FPN | ✓ | ✓ | 1x | 16.0 | 79.42 | model |
S2A-Net | R-101-FPN | ✓ | ✓ | 1x | 12.7 | 79.15 | model |
*Note that the mAP reported here is a little different from the original paper. All results are reported on DOTA-v1.0 test set. All checkpoints here are trained with the Original version, and not compatible with the updated version.
- 20210104 updated version
Model | Data | Backbone | MS | Rotate | Lr schd | box AP | Download |
---|---|---|---|---|---|---|---|
RetinaNet | HRSC2016 | R-50-FPN | - | ✓ | 6x | 81.63 | cfg model log |
CS2A-Net-1s | HRSC2016 | R-50-FPN | - | ✓ | 4x | 84.58 | cfg model log |
CS2A-Net-2s | HRSC2016 | R-50-FPN | - | ✓ | 3x | 89.96 | cfg model log |
S2A-Net | HRSC2016 | R-101-FPN | - | ✓ | 3x | 90.00 | cfg model |
CS2A-Net-1s | DOTA | R-50-FPN | - | - | 1x | 69.06 | cfg model log |
CS2A-Net-2s | DOTA | R-50-FPN | - | - | 1x | 73.67 | cfg model log |
S2A-Net | DOTA | R-50-FPN | - | - | 1x | 74.04 | cfg model |
CS2A-Net-2s-IoU | DOTA | R-50-FPN | - | - | 1x | 74.58 | cfg model log |
Note:
-
All models are trained on 4 GPUs with a initial learning rate 0.01. If you train the model with fewer/more GPUs, remember to change the lr, e.g., 0.01lr=0.0025lr*4GPU, 0.0025lr=0.0025lr*1GPU, 0.02lr=0.0025lr*8GPU
-
CS2A-Net-ns indicates Cascade S2A-Net with n stages. For more information, please refer to CASCADE_S2ANET.md
-
IoU means IoU Loss for bbox regression.
-
The checkpoints of S2A-Net are converted from the original version.
-
If you cannot get access to Google Drive, BaiduYun download link can be found here with extracting code ABCD.
Please refer to install.md for installation and dataset preparation.
Please see getting_started.md for the basic usage of MMDetection.
@article{han2021align,
author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Align Deep Features for Oriented Object Detection},
year={2021},
pages={1-11},
doi={10.1109/TGRS.2021.3062048}}
@inproceedings{xia2018dota,
title={DOTA: A large-scale dataset for object detection in aerial images},
author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3974--3983},
year={2018}
}
@InProceedings{Ding_2019_CVPR,
author = {Ding, Jian and Xue, Nan and Long, Yang and Xia, Gui-Song and Lu, Qikai},
title = {Learning RoI Transformer for Oriented Object Detection in Aerial Images},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
@article{chen2019mmdetection,
title={MMDetection: Open mmlab detection toolbox and benchmark},
author={Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and others},
journal={arXiv preprint arXiv:1906.07155},
year={2019}
}