- To detect focal objects of interest in the satellite imagery
- To improve performance with architecture and framework
- Analysis on the result such as object size and type
Name | Role | Task |
---|---|---|
차수연 | Team member | EDA, Transform the dataset’s format, Training model and Analysis on the result and PPT. Mange GCP. |
김하늘 | Team leader | Manage the projects objectives. Training model and Analysis on the result. |
유상민 | Team member | Team member. Training model and Analysis on the result. Server management. |
황동호 | Team member | Team member. Training large Image and Analysis on the result. Issue management. |
- Dataset Download: AI hub
- Data Info
- Images: png + tif
- Annotation: json
- Geographic context: Kml
- Patch Size: 1,024x1,024
- Large Image
- type: tif
- Pixel Size: 12362 x 11344
-
RetinaNet
- Feed Forward Network: ResNet
- Backbone Network: FPN (Feature Pyramid Networks)
- Loss function: Focal Loss
-
- Detectron2 is Facebook AI Research's next generation library that provides state-of-the-art detection and segmentation algorithms.
- We trained a dectection model from an existing model pre-trained on the COCO dataset, available in detectron2's model zoo.
- We chose R101 in RetinaNet baselines.
Challenging tasks in Aerial Images
Object detection in aerial images is a challenging task due to the massive variations of scale, rotation, aspect ratio, and densely arranged targets.
More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images.
The dataset contains 424,750 object instances of 21 categories of oriented-bounding-box annotations collected from 1,748 aerial images.
Based on this large-scale and well-annotated dataset from AI hub, we built baselines covering various state-of-the-art detection and segmentation algorithms with framework Detectron2, where the speed and accuracy performances of each model have been evaluated.
Problems and Solutions
-
Speed up model training
- Chose the RetinaNet to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors
-
Address Class Imbalance
- RetinaNet proposes the Focal Loss
- Reshaping the standard cross entropy loss so that it down-weights the loss assigned to well-classified examples.
- Focusing Training on a sparse set of hard examples and preventing the vast number of easy negatives from overwhelming the detector during training.
- RetinaNet proposes the Focal Loss
-
Data Imbalance
- Separate the largest class from the others.
- Train, evaluate and test two datasets separately.
-
Oriented Bounding Box
The variations of the orientation of objects is caused by the bird's-eye view of aerial images. So we tried to predict rotated bounding box, but couldn’t succeed.
Cause of failure
- We transformed the Dota dataset format to Coco to use Detectron2 baselines. In this process, the produced annotations only included gemetric annotations and failed to produce angle annotations.
- We focused on the speed and accuracy performances rather than predicting accurately the rotated bounding box. (Detectron2 provided various backbone networks, so we can customize datasets for our task.)
- Metrics: Average Precision
- The COCO Object Detection challenge also includes mean average recall as a detection metric. So we used average precision as a principal metric to evaluate object detectors. There are AP, AP50, AP75, mAP, AP@[0.5:0.95].
Training Step | AP | AP50 | AP75 |
---|---|---|---|
10,000 | 8.104 | 16.200 | 8.642 |
25,000 | 11.280 | 22.66 | 10.910 |
50,000 | 12.321 | 23.320 | 11.770 |
150,000 | 13.140 | 24.475 | 13.262 |