A paper list of object detection using deep learning. I wrote this page with reference to this survey paper and searching and searching..
Last updated: 2019/07/01
2018/9/18 - update all of recent papers and make some diagram about history of object detection using deep learning.
2018/9/26 - update codes of papers. (official and unofficial)
2018/october - update 5 papers and performance table.
2018/november - update 9 papers.
2018/december - update 8 papers and and performance table and add new diagram(2019 version!!).
2019/january - update 4 papers and and add commonly used datasets.
2019/february - update 3 papers.
2019/march - update figure and code links.
2019/april - remove author's names and update ICLR 2019 & CVPR 2019 papers.
2019/may - update CVPR 2019 papers.
2019/june - update CVPR 2019 papers and dataset paper.
The part highlighted with red characters means papers that i think "must-read". However, it is my personal opinion and other papers are important too, so I recommend to read them if you have time.
FPS(Speed) index is related to the hardware spec(e.g. CPU, GPU, RAM, etc), so it is hard to make an equal comparison. The solution is to measure the performance of all models on hardware with equivalent specifications, but it is very difficult and time consuming.
Detector | VOC07 (mAP@IoU=0.5) | VOC12 (mAP@IoU=0.5) | COCO (mAP@IoU=0.5:0.95) | Published In |
---|---|---|---|---|
R-CNN | 58.5 | - | - | CVPR'14 |
SPP-Net | 59.2 | - | - | ECCV'14 |
MR-CNN | 78.2 (07+12) | 73.9 (07+12) | - | ICCV'15 |
Fast R-CNN | 70.0 (07+12) | 68.4 (07++12) | 19.7 | ICCV'15 |
Faster R-CNN | 73.2 (07+12) | 70.4 (07++12) | 21.9 | NIPS'15 |
YOLO v1 | 66.4 (07+12) | 57.9 (07++12) | - | CVPR'16 |
G-CNN | 66.8 | 66.4 (07+12) | - | CVPR'16 |
AZNet | 70.4 | - | 22.3 | CVPR'16 |
ION | 80.1 | 77.9 | 33.1 | CVPR'16 |
HyperNet | 76.3 (07+12) | 71.4 (07++12) | - | CVPR'16 |
OHEM | 78.9 (07+12) | 76.3 (07++12) | 22.4 | CVPR'16 |
MPN | - | - | 33.2 | BMVC'16 |
SSD | 76.8 (07+12) | 74.9 (07++12) | 31.2 | ECCV'16 |
GBDNet | 77.2 (07+12) | - | 27.0 | ECCV'16 |
CPF | 76.4 (07+12) | 72.6 (07++12) | - | ECCV'16 |
R-FCN | 79.5 (07+12) | 77.6 (07++12) | 29.9 | NIPS'16 |
DeepID-Net | 69.0 | - | - | PAMI'16 |
NoC | 71.6 (07+12) | 68.8 (07+12) | 27.2 | TPAMI'16 |
DSSD | 81.5 (07+12) | 80.0 (07++12) | 33.2 | arXiv'17 |
TDM | - | - | 37.3 | CVPR'17 |
FPN | - | - | 36.2 | CVPR'17 |
YOLO v2 | 78.6 (07+12) | 73.4 (07++12) | - | CVPR'17 |
RON | 77.6 (07+12) | 75.4 (07++12) | 27.4 | CVPR'17 |
DeNet | 77.1 (07+12) | 73.9 (07++12) | 33.8 | ICCV'17 |
CoupleNet | 82.7 (07+12) | 80.4 (07++12) | 34.4 | ICCV'17 |
RetinaNet | - | - | 39.1 | ICCV'17 |
DSOD | 77.7 (07+12) | 76.3 (07++12) | - | ICCV'17 |
SMN | 70.0 | - | - | ICCV'17 |
Light-Head R-CNN | - | - | 41.5 | arXiv'17 |
YOLO v3 | - | - | 33.0 | arXiv'18 |
SIN | 76.0 (07+12) | 73.1 (07++12) | 23.2 | CVPR'18 |
STDN | 80.9 (07+12) | - | - | CVPR'18 |
RefineDet | 83.8 (07+12) | 83.5 (07++12) | 41.8 | CVPR'18 |
SNIP | - | - | 45.7 | CVPR'18 |
Relation-Network | - | - | 32.5 | CVPR'18 |
Cascade R-CNN | - | - | 42.8 | CVPR'18 |
MLKP | 80.6 (07+12) | 77.2 (07++12) | 28.6 | CVPR'18 |
Fitness-NMS | - | - | 41.8 | CVPR'18 |
RFBNet | 82.2 (07+12) | - | - | ECCV'18 |
CornerNet | - | - | 42.1 | ECCV'18 |
PFPNet | 84.1 (07+12) | 83.7 (07++12) | 39.4 | ECCV'18 |
Pelee | 70.9 (07+12) | - | - | NIPS'18 |
HKRM | 78.8 (07+12) | - | 37.8 | NIPS'18 |
M2Det | - | - | 44.2 | AAAI'19 |
R-DAD | 81.2 (07++12) | 82.0 (07++12) | 43.1 | AAAI'19 |
-
[R-CNN] Rich feature hierarchies for accurate object detection and semantic segmentation | [CVPR' 14] |
[pdf]
[official code - caffe]
-
[OverFeat] OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks | [ICLR' 14] |
[pdf]
[official code - torch]
-
[MultiBox] Scalable Object Detection using Deep Neural Networks | [CVPR' 14] |
[pdf]
-
[SPP-Net] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition | [ECCV' 14] |
[pdf]
[official code - caffe]
[unofficial code - keras]
[unofficial code - tensorflow]
-
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction | [CVPR' 15] |
[pdf]
[official code - matlab]
-
[MR-CNN] Object detection via a multi-region & semantic segmentation-aware CNN model | [ICCV' 15] |
[pdf]
[official code - caffe]
-
[DeepBox] DeepBox: Learning Objectness with Convolutional Networks | [ICCV' 15] |
[pdf]
[official code - caffe]
-
[AttentionNet] AttentionNet: Aggregating Weak Directions for Accurate Object Detection | [ICCV' 15] |
[pdf]
-
[Fast R-CNN] Fast R-CNN | [ICCV' 15] |
[pdf]
[official code - caffe]
-
[DeepProposal] DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers | [ICCV' 15] |
[pdf]
[official code - matconvnet]
-
[Faster R-CNN, RPN] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks | [NIPS' 15] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[YOLO v1] You Only Look Once: Unified, Real-Time Object Detection | [CVPR' 16] |
[pdf]
[official code - c]
-
[G-CNN] G-CNN: an Iterative Grid Based Object Detector | [CVPR' 16] |
[pdf]
-
[AZNet] Adaptive Object Detection Using Adjacency and Zoom Prediction | [CVPR' 16] |
[pdf]
-
[ION] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks | [CVPR' 16] |
[pdf]
-
[HyperNet] HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection | [CVPR' 16] |
[pdf]
-
[OHEM] Training Region-based Object Detectors with Online Hard Example Mining | [CVPR' 16] |
[pdf]
[official code - caffe]
-
[CRAPF] CRAFT Objects from Images | [CVPR' 16] |
[pdf]
[official code - caffe]
-
[MPN] A MultiPath Network for Object Detection | [BMVC' 16] |
[pdf]
[official code - torch]
-
[SSD] SSD: Single Shot MultiBox Detector | [ECCV' 16] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[GBDNet] Crafting GBD-Net for Object Detection | [ECCV' 16] |
[pdf]
[official code - caffe]
-
[CPF] Contextual Priming and Feedback for Faster R-CNN | [ECCV' 16] |
[pdf]
-
[MS-CNN] A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection | [ECCV' 16] |
[pdf]
[official code - caffe]
-
[R-FCN] R-FCN: Object Detection via Region-based Fully Convolutional Networks | [NIPS' 16] |
[pdf]
[official code - caffe]
[unofficial code - caffe]
-
[PVANET] PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection | [NIPSW' 16] |
[pdf]
[official code - caffe]
-
[DeepID-Net] DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection | [PAMI' 16] |
[pdf]
-
[NoC] Object Detection Networks on Convolutional Feature Maps | [TPAMI' 16] |
[pdf]
-
[DSSD] DSSD : Deconvolutional Single Shot Detector | [arXiv' 17] |
[pdf]
[official code - caffe]
-
[TDM] Beyond Skip Connections: Top-Down Modulation for Object Detection | [CVPR' 17] |
[pdf]
-
[FPN] Feature Pyramid Networks for Object Detection | [CVPR' 17] |
[pdf]
[unofficial code - caffe]
-
[YOLO v2] YOLO9000: Better, Faster, Stronger | [CVPR' 17] |
[pdf]
[official code - c]
[unofficial code - caffe]
[unofficial code - tensorflow]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[RON] RON: Reverse Connection with Objectness Prior Networks for Object Detection | [CVPR' 17] |
[pdf]
[official code - caffe]
[unofficial code - tensorflow]
-
[RSA] Recurrent Scale Approximation for Object Detection in CNN | | [ICCV' 17] |
[pdf]
[official code - caffe]
-
[DCN] Deformable Convolutional Networks | [ICCV' 17] |
[pdf]
[official code - mxnet]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[DeNet] DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling | [ICCV' 17] |
[pdf]
[official code - theano]
-
[CoupleNet] CoupleNet: Coupling Global Structure with Local Parts for Object Detection | [ICCV' 17] |
[pdf]
[official code - caffe]
-
[RetinaNet] Focal Loss for Dense Object Detection | [ICCV' 17] |
[pdf]
[official code - keras]
[unofficial code - pytorch]
[unofficial code - mxnet]
[unofficial code - tensorflow]
-
[Mask R-CNN] Mask R-CNN | [ICCV' 17] |
[pdf]
[official code - caffe2]
[unofficial code - tensorflow]
[unofficial code - tensorflow]
[unofficial code - pytorch]
-
[DSOD] DSOD: Learning Deeply Supervised Object Detectors from Scratch | [ICCV' 17] |
[pdf]
[official code - caffe]
[unofficial code - pytorch]
-
[SMN] Spatial Memory for Context Reasoning in Object Detection | [ICCV' 17] |
[pdf]
-
[Light-Head R-CNN] Light-Head R-CNN: In Defense of Two-Stage Object Detector | [arXiv' 17] |
[pdf]
[official code - tensorflow]
-
[Soft-NMS] Improving Object Detection With One Line of Code | [ICCV' 17] |
[pdf]
[official code - caffe]
-
[YOLO v3] YOLOv3: An Incremental Improvement | [arXiv' 18] |
[pdf]
[official code - c]
[unofficial code - pytorch]
[unofficial code - pytorch]
[unofficial code - keras]
[unofficial code - tensorflow]
-
[ZIP] Zoom Out-and-In Network with Recursive Training for Object Proposal | [IJCV' 18] |
[pdf]
[official code - caffe]
-
[SIN] Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships | [CVPR' 18] |
[pdf]
[official code - tensorflow]
-
[STDN] Scale-Transferrable Object Detection | [CVPR' 18] |
[pdf]
-
[RefineDet] Single-Shot Refinement Neural Network for Object Detection | [CVPR' 18] |
[pdf]
[official code - caffe]
[unofficial code - chainer]
[unofficial code - pytorch]
-
[MegDet] MegDet: A Large Mini-Batch Object Detector | [CVPR' 18] |
[pdf]
-
[DA Faster R-CNN] Domain Adaptive Faster R-CNN for Object Detection in the Wild | [CVPR' 18] |
[pdf]
[official code - caffe]
-
[SNIP] An Analysis of Scale Invariance in Object Detection – SNIP | [CVPR' 18] |
[pdf]
-
[Relation-Network] Relation Networks for Object Detection | [CVPR' 18] |
[pdf]
[official code - mxnet]
-
[Cascade R-CNN] Cascade R-CNN: Delving into High Quality Object Detection | [CVPR' 18] |
[pdf]
[official code - caffe]
-
Finding Tiny Faces in the Wild with Generative Adversarial Network | [CVPR' 18] |
[pdf]
-
[MLKP] Multi-scale Location-aware Kernel Representation for Object Detection | [CVPR' 18] |
[pdf]
[official code - caffe]
-
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation | [CVPR' 18] |
[pdf]
[official code - chainer]
-
[Fitness NMS] Improving Object Localization with Fitness NMS and Bounded IoU Loss | [CVPR' 18] |
[pdf]
-
[STDnet] STDnet: A ConvNet for Small Target Detection | [BMVC' 18] |
[pdf]
-
[RFBNet] Receptive Field Block Net for Accurate and Fast Object Detection | [ECCV' 18] |
[pdf]
[official code - pytorch]
-
Zero-Annotation Object Detection with Web Knowledge Transfer | [ECCV' 18] |
[pdf]
-
[CornerNet] CornerNet: Detecting Objects as Paired Keypoints | [ECCV' 18] |
[pdf]
[official code - pytorch]
-
[PFPNet] Parallel Feature Pyramid Network for Object Detection | [ECCV' 18] |
[pdf]
-
[Softer-NMS] Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection | [arXiv' 18] |
[pdf]
-
[ShapeShifter] ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector | [ECML-PKDD' 18] |
[pdf]
[official code - tensorflow]
-
[Pelee] Pelee: A Real-Time Object Detection System on Mobile Devices | [NIPS' 18] |
[pdf]
[official code - caffe]
-
[HKRM] Hybrid Knowledge Routed Modules for Large-scale Object Detection | [NIPS' 18] |
[pdf]
-
[MetaAnchor] MetaAnchor: Learning to Detect Objects with Customized Anchors | [NIPS' 18] |
[pdf]
-
[SNIPER] SNIPER: Efficient Multi-Scale Training | [NIPS' 18] |
[pdf]
-
[M2Det] M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | [AAAI' 19] |
[pdf]
[official code - pytorch]
-
[R-DAD] Object Detection based on Region Decomposition and Assembly | [AAAI' 19] |
[pdf]
-
[CAMOU] CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild | [ICLR' 19] |
[pdf]
-
Feature Intertwiner for Object Detection | [ICLR' 19] |
[pdf]
-
[GIoU] Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression | [CVPR' 19] |
[pdf]
-
Automatic adaptation of object detectors to new domains using self-training | [CVPR' 19] |
[pdf]
-
[Libra R-CNN] Libra R-CNN: Balanced Learning for Object Detection | [CVPR' 19] |
[pdf]
-
Feature Selective Anchor-Free Module for Single-Shot Object Detection | [CVPR' 19] |
[pdf]
-
[ExtremeNet] Bottom-up Object Detection by Grouping Extreme and Center Points | [CVPR' 19] |
[pdf]
|[official code - pytorch]
-
[C-MIL] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection | [CVPR' 19] |
[pdf]
|[official code - torch]
-
[ScratchDet] ScratchDet: Training Single-Shot Object Detectors from Scratch | [CVPR' 19] |
[pdf]
-
Bounding Box Regression with Uncertainty for Accurate Object Detection | [CVPR' 19] |
[pdf]
|[official code - caffe2]
-
Activity Driven Weakly Supervised Object Detection | [CVPR' 19] |
[pdf]
-
Towards Accurate One-Stage Object Detection with AP-Loss | [CVPR' 19] |
[pdf]
-
Strong-Weak Distribution Alignment for Adaptive Object Detection | [CVPR' 19] |
[pdf]
|[official code - pytorch]
-
[NAS-FPN] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection | [CVPR' 19] |
[pdf]
-
[Adaptive NMS] Adaptive NMS: Refining Pedestrian Detection in a Crowd | [CVPR' 19] |
[pdf]
-
Point in, Box out: Beyond Counting Persons in Crowds | [CVPR' 19] |
[pdf]
-
Locating Objects Without Bounding Boxes | [CVPR' 19] |
[pdf]
-
Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects | [CVPR' 19] |
[pdf]
-
Towards Universal Object Detection by Domain Attention | [CVPR' 19] |
[pdf]
-
Exploring the Bounds of the Utility of Context for Object Detection | [CVPR' 19] |
[pdf]
-
What Object Should I Use? - Task Driven Object Detection | [CVPR' 19] |
[pdf]
-
Dissimilarity Coefficient based Weakly Supervised Object Detection | [CVPR' 19] |
[pdf]
-
Adapting Object Detectors via Selective Cross-Domain Alignment | [CVPR' 19] |
[pdf]
-
Fully Quantized Network for Object Detection | [CVPR' 19] |
[pdf]
-
Distilling Object Detectors with Fine-grained Feature Imitation | [CVPR' 19] |
[pdf]
-
Multi-task Self-Supervised Object Detection via Recycling of Bounding Box Annotations | [CVPR' 19] |
[pdf]
-
[Reasoning-RCNN] Reasoning-RCNN: Unifying Adaptive Global Reasoning into Large-scale Object Detection | [CVPR' 19] |
[pdf]
-
Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation | [CVPR' 19] |
[pdf]
-
Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors | [CVPR' 19] |
[pdf]
-
Spatial-aware Graph Relation Network for Large-scale Object Detection | [CVPR' 19] |
[pdf]
-
[MaxpoolNMS] MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors | [CVPR' 19] |
[pdf]
-
You reap what you sow: Generating High Precision Object Proposals for Weakly-supervised Object Detection | [CVPR' 19] |
[pdf]
-
Object detection with location-aware deformable convolution and backward attention filtering | [CVPR' 19] |
[pdf]
-
Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection | [CVPR' 19] |
[pdf]
Statistics of commonly used object detection datasets. The Table came from this survey paper.
Challenge | Object Classes | Number of Images | Number of Annotated Images | |||
---|---|---|---|---|---|---|
Train | Val | Test | Train | Val | ||
PASCAL VOC Object Detection Challenge | ||||||
VOC07 | 20 | 2,501 | 2,510 | 4,952 | 6,301 (7,844) | 6,307 (7,818) |
VOC08 | 20 | 2,111 | 2,221 | 4,133 | 5,082 (6,337) | 5,281 (6,347) |
VOC09 | 20 | 3,473 | 3,581 | 6,650 | 8,505 (9,760) | 8,713 (9,779) |
VOC10 | 20 | 4,998 | 5,105 | 9,637 | 11,577 (13,339) | 11,797 (13,352) |
VOC11 | 20 | 5,717 | 5,823 | 10,994 | 13,609 (15,774) | 13,841 (15,787) |
VOC12 | 20 | 5,717 | 5,823 | 10,991 | 13,609 (15,774) | 13,841 (15,787) |
ILSVRC Object Detection Challenge | ||||||
ILSVRC13 | 200 | 395,909 | 20,121 | 40,152 | 345,854 | 55,502 |
ILSVRC14 | 200 | 456,567 | 20,121 | 40,152 | 478,807 | 55,502 |
ILSVRC15 | 200 | 456,567 | 20,121 | 51,294 | 478,807 | 55,502 |
ILSVRC16 | 200 | 456,567 | 20,121 | 60,000 | 478,807 | 55,502 |
ILSVRC17 | 200 | 456,567 | 20,121 | 65,500 | 478,807 | 55,502 |
MS COCO Object Detection Challenge | ||||||
MS COCO15 | 80 | 82,783 | 40,504 | 81,434 | 604,907 | 291,875 |
MS COCO16 | 80 | 82,783 | 40,504 | 81,434 | 604,907 | 291,875 |
MS COCO17 | 80 | 118,287 | 5,000 | 40,670 | 860,001 | 36,781 |
MS COCO18 | 80 | 118,287 | 5,000 | 40,670 | 860,001 | 36,781 |
Open Images Object Detection Challenge | ||||||
OID18 | 500 | 1,743,042 | 41,620 | 125,436 | 12,195,144 | ― |
The papers related to datasets used mainly in Object Detection are as follows.
-
[PASCAL VOC] The PASCAL Visual Object Classes (VOC) Challenge | [IJCV' 10] |
[pdf]
-
[PASCAL VOC] The PASCAL Visual Object Classes Challenge: A Retrospective | [IJCV' 15] |
[pdf]
|[link]
-
[ImageNet] ImageNet: A Large-Scale Hierarchical Image Database| [CVPR' 09] |
[pdf]
-
[ImageNet] ImageNet Large Scale Visual Recognition Challenge | [IJCV' 15] |
[pdf]
|[link]
-
[COCO] Microsoft COCO: Common Objects in Context | [ECCV' 14] |
[pdf]
|[link]
-
[Open Images] The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale | [arXiv' 18] |
[pdf]
|[link]
-
[DOTA] DOTA: A Large-scale Dataset for Object Detection in Aerial Images | [CVPR' 18] |
[pdf]
|[link]
If you have any suggestions about papers, feel free to mail me :)