contributors: @GitYCC
- Following YOLO9000 our system predicts bounding boxes using dimension clusters as anchor boxes.
- the predictions correspond to:
$b_x =σ(t_x)+c_x$ $b_y =σ(t_y)+c_y$ $b_w =p_we^{t_w}$ $b_h =p_he^{t_h}$
- During training we use sum of squared error loss.
- Each box predicts the classes the bounding box may contain using multilabel classification.
- We simply use independent logistic classifiers. During training we use binary cross-entropy loss for the class predictions.
-
YOLOv3 predicts boxes at 3 different scales. Our system extracts features from those scales using a similar concept to feature pyramid networks (FPN).
-
Non-max suppression is used for choosing boxes from 3 different scales.
Our new network Darknet-53 is a hybrid approach between the network used in YOLOv2, Darknet-19, and that newfangled residual network stuff. Our network uses successive 3 × 3 and 1 × 1 convolutional layers but now has some shortcut connections as well and is significantly larger.