Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can it convert yolo's version 2 (.weights) files to caffemodel files? #15

Open
AlexeyAB opened this issue Dec 4, 2016 · 30 comments
Open

Comments

@AlexeyAB
Copy link

AlexeyAB commented Dec 4, 2016

Now there is Yolo v2 by link: http://pjreddie.com/darknet/yolo/
And old Yolo v1 is here: http://pjreddie.com/darknet/yolov1/
Can the caffe-yolo convert yolo's version 2 (.weights) files to caffemodel files by using create_yolo_caffemodel.py?
Or will it possible later?

@xingwangsfu
Copy link
Owner

This should be doable, but I think there are some layers that are not supported by caffe, e.g., [route] and [reorg]. You may need to dig into yolo source code and try to implement these layers in caffe. Currently, I don't have free hands to do this. You are welcome to contribute.

@Jumabek
Copy link

Jumabek commented Mar 29, 2017

The author of YOLO v2 says, route + reorg only adds 1% improvement.
I think main game changer is Region Layers
Should you add this layer, that would be awesome.
I will try myself too, but not sure with the result.

@Jumabek
Copy link

Jumabek commented Mar 29, 2017

@tk2github
Copy link

Hi,
We are also stuck on the same issue. Were you able to try this out? If so, can you please help us with that layer?

@Jumabek
Copy link

Jumabek commented Apr 11, 2017

Hey,

In order to use YOLOv2 on caffe,
we need region layer (implement as python layer), because of small computation it doesn't need CUDA , just cpu is fine for region layer. That's what author has done as well.

However, that layer is not simple.
Even though theory was simple, its implementations looks very complicated to me https://github.com/Jumabek/darknet/blob/master/src/region_layer.c

Unfortunately, I gave up on half way. Cuz right now working on another project and really enjoying using YOLO for my projects.

@SHaiHosh
Copy link

you have to compile the released version in:
https://github.com/gklz1982/caffe-yolov2
Then, there is an error in the released prototxt file
see the corrected prototxt:

name: "YOLONET"
layer {
name: "data"
type: "Input"
top: "data"
top: "label"
input_param { shape: { dim: 1 dim: 3 dim: 416 dim: 416 }
shape: { dim: 1 dim: 1 dim: 30 dim: 5 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 32
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "conv1"
top: "bn1"
}
layer {
name: "scale1"
type: "Scale"
bottom: "bn1"
top: "scale1"
scale_param {
bias_term: true
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "scale1"
top: "scale1"
relu_param{
negative_slope: 0.1
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "scale1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer{
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn2"
type: "BatchNorm"
bottom: "conv2"
top: "bn2"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale2"
type: "Scale"
bottom: "bn2"
top: "scale2"
scale_param {
bias_term: true
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "scale2"
top: "scale2"
relu_param{
negative_slope: 0.1
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "scale2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}

layer{
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn3"
type: "BatchNorm"
bottom: "conv3"
top: "bn3"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale3"
type: "Scale"
bottom: "bn3"
top: "scale3"
scale_param {
bias_term: true
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "scale3"
top: "scale3"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv4"
type: "Convolution"
bottom: "scale3"
top: "conv4"
convolution_param {
num_output: 64
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "bn4"
type: "BatchNorm"
bottom: "conv4"
top: "bn4"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale4"
type: "Scale"
bottom: "bn4"
top: "scale4"
scale_param {
bias_term: true
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "scale4"
top: "scale4"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv5"
type: "Convolution"
bottom: "scale4"
top: "conv5"
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn5"
type: "BatchNorm"
bottom: "conv5"
top: "bn5"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale5"
type: "Scale"
bottom: "bn5"
top: "scale5"
scale_param {
bias_term: true
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "scale5"
top: "scale5"
relu_param{
negative_slope: 0.1
}
}
layer {
name: "pool5"
type: "Pooling"
bottom: "scale5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}

layer{
name: "conv6"
type: "Convolution"
bottom: "pool5"
top: "conv6"
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn6"
type: "BatchNorm"
bottom: "conv6"
top: "bn6"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale6"
type: "Scale"
bottom: "bn6"
top: "scale6"
scale_param {
bias_term: true
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "scale6"
top: "scale6"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv7"
type: "Convolution"
bottom: "scale6"
top: "conv7"
convolution_param {
num_output: 128
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "bn7"
type: "BatchNorm"
bottom: "conv7"
top: "bn7"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale7"
type: "Scale"
bottom: "bn7"
top: "scale7"
scale_param {
bias_term: true
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "scale7"
top: "scale7"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv8"
type: "Convolution"
bottom: "scale7"
top: "conv8"
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn8"
type: "BatchNorm"
bottom: "conv8"
top: "bn8"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale8"
type: "Scale"
bottom: "bn8"
top: "scale8"
scale_param {
bias_term: true
}
}
layer {
name: "relu8"
type: "ReLU"
bottom: "scale8"
top: "scale8"
relu_param{
negative_slope: 0.1
}
}
layer {
name: "pool8"
type: "Pooling"
bottom: "scale8"
top: "pool8"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}

layer{
name: "conv9"
type: "Convolution"
bottom: "pool8"
top: "conv9"
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn9"
type: "BatchNorm"
bottom: "conv9"
top: "bn9"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale9"
type: "Scale"
bottom: "bn9"
top: "scale9"
scale_param {
bias_term: true
}
}
layer {
name: "relu9"
type: "ReLU"
bottom: "scale9"
top: "scale9"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv10"
type: "Convolution"
bottom: "scale9"
top: "conv10"
convolution_param {
num_output: 256
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "bn10"
type: "BatchNorm"
bottom: "conv10"
top: "bn10"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale10"
type: "Scale"
bottom: "bn10"
top: "scale10"
scale_param {
bias_term: true
}
}
layer {
name: "relu10"
type: "ReLU"
bottom: "scale10"
top: "scale10"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv11"
type: "Convolution"
bottom: "scale10"
top: "conv11"
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn11"
type: "BatchNorm"
bottom: "conv11"
top: "bn11"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale11"
type: "Scale"
bottom: "bn11"
top: "scale11"
scale_param {
bias_term: true
}
}
layer {
name: "relu11"
type: "ReLU"
bottom: "scale11"
top: "scale11"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv12"
type: "Convolution"
bottom: "scale11"
top: "conv12"
convolution_param {
num_output: 256
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "bn12"
type: "BatchNorm"
bottom: "conv12"
top: "bn12"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale12"
type: "Scale"
bottom: "bn12"
top: "scale12"
scale_param {
bias_term: true
}
}
layer {
name: "relu12"
type: "ReLU"
bottom: "scale12"
top: "scale12"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv13"
type: "Convolution"
bottom: "scale12"
top: "conv13"
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn13"
type: "BatchNorm"
bottom: "conv13"
top: "bn13"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale13"
type: "Scale"
bottom: "bn13"
top: "scale13"
scale_param {
bias_term: true
}
}
layer {
name: "relu13"
type: "ReLU"
bottom: "scale13"
top: "scale13"
relu_param{
negative_slope: 0.1
}
}
layer {
name: "pool13"
type: "Pooling"
bottom: "scale13"
top: "pool13"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}

layer{
name: "conv14"
type: "Convolution"
bottom: "pool13"
top: "conv14"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn14"
type: "BatchNorm"
bottom: "conv14"
top: "bn14"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale14"
type: "Scale"
bottom: "bn14"
top: "scale14"
scale_param {
bias_term: true
}
}
layer {
name: "relu14"
type: "ReLU"
bottom: "scale14"
top: "scale14"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv15"
type: "Convolution"
bottom: "scale14"
top: "conv15"
convolution_param {
num_output: 512
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "bn15"
type: "BatchNorm"
bottom: "conv15"
top: "bn15"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale15"
type: "Scale"
bottom: "bn15"
top: "scale15"
scale_param {
bias_term: true
}
}
layer {
name: "relu15"
type: "ReLU"
bottom: "scale15"
top: "scale15"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv16"
type: "Convolution"
bottom: "scale15"
top: "conv16"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn16"
type: "BatchNorm"
bottom: "conv16"
top: "bn16"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale16"
type: "Scale"
bottom: "bn16"
top: "scale16"
scale_param {
bias_term: true
}
}
layer {
name: "relu16"
type: "ReLU"
bottom: "scale16"
top: "scale16"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv17"
type: "Convolution"
bottom: "scale16"
top: "conv17"
convolution_param {
num_output: 512
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "bn17"
type: "BatchNorm"
bottom: "conv17"
top: "bn17"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale17"
type: "Scale"
bottom: "bn17"
top: "scale17"
scale_param {
bias_term: true
}
}
layer {
name: "relu17"
type: "ReLU"
bottom: "scale17"
top: "scale17"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv18"
type: "Convolution"
bottom: "scale17"
top: "conv18"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn18"
type: "BatchNorm"
bottom: "conv18"
top: "bn18"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale18"
type: "Scale"
bottom: "bn18"
top: "scale18"
scale_param {
bias_term: true
}
}
layer {
name: "relu18"
type: "ReLU"
bottom: "scale18"
top: "scale18"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv19"
type: "Convolution"
bottom: "scale18"
top: "conv19"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn19"
type: "BatchNorm"
bottom: "conv19"
top: "bn19"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale19"
type: "Scale"
bottom: "bn19"
top: "scale19"
scale_param {
bias_term: true
}
}
layer {
name: "relu19"
type: "ReLU"
bottom: "scale19"
top: "scale19"
relu_param{
negative_slope: 0.1
}
}

layer{
name: "conv20"
type: "Convolution"
bottom: "scale19"
top: "conv20"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}

}
layer {
name: "bn20"
type: "BatchNorm"
bottom: "conv20"
top: "bn20"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale20"
type: "Scale"
bottom: "bn20"
top: "scale20"
scale_param {
bias_term: true
}
}
layer {
name: "relu20"
type: "ReLU"
bottom: "scale20"
top: "scale20"
relu_param {
negative_slope: 0.1
}
}

layer {
name: "concat1"
type: "Concat"
bottom: "scale13"
top: "concat1"
}

layer {
name: "conv21"
type: "Convolution"
bottom: "concat1"
top: "conv21"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
bias_term: false
}
}
layer {
name: "bn21"
type: "BatchNorm"
bottom: "conv21"
top: "bn21"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "reorg1"
type: "Reorg"
bottom: "bn21"
top: "reorg1"
reorg_param {
stride: 2
}
}

layer {
name: "concat2"
type: "Concat"
bottom: "reorg1"
bottom: "scale20"
top: "concat2"
}

layer{
name: "conv22"
type: "Convolution"
bottom: "concat2"
top: "conv22"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "bn22"
type: "BatchNorm"
bottom: "conv22"
top: "bn22"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "scale22"
type: "Scale"
bottom: "bn22"
top: "scale22"
scale_param {
bias_term: true
}
}
layer {
name: "relu22"
type: "ReLU"
bottom: "scale22"
top: "scale22"
relu_param{
negative_slope: 0.1
}
}

layer {
name: "conv23"
type: "Convolution"
bottom: "scale22"
top: "conv23"
convolution_param {
num_output: 125
kernel_size: 1
stride: 1
pad: 0
}
}
layer {
name: "relu23"
type: "ReLU"
bottom: "conv23"
top: "conv23"
}
layer {
name: "region1"
type: "RegionLoss"
bottom: "conv23"
bottom: "label"
top: "region1"
region_loss_param {
side: 13
num_class: 20
coords: 4
num: 5
}
}

layer {
name: "detection_out"
type: "DetectionOutput"
bottom: "conv23"
top: "detection_out"
include {
phase: TEST
}
detection_output_param {
num_classes: 9
coords: 4
confidence_threshold: 0.01
biases: 0.738768
biases: 0.874946
biases: 2.42204
biases: 2.65704
biases: 4.30971
biases: 7.04493
biases: 10.246
biases: 4.59428
biases: 12.6868
biases: 11.8741
}
}
layer {
name: "detection_eval"
type: "DetectionEvaluate"
bottom: "detection_out"
bottom: "label"
top: "detection_eval"
include {
phase: TEST
}
detection_evaluate_param {
num_classes: 9
overlap_threshold: 0.5
}
}

#http://ethereon.github.io/netscope/#/gist/9640ecb59a75f230446e7c70d2f8bcf3

@xzhangxa
Copy link

@SHaiHosh Hi, thanks for your prototxt! Using the script that project provides it can convert yolo-voc.weights to caffemodel, I'll try to make it work for yolo.prototxt accordingly. BTW have you tried to train yolo2 on Caffe on that project? Does it work fine or not?

@SHaiHosh
Copy link

Yes it works
The only issue is the normalization.
the released code makes the values of the image be between -128 to 127
after I fixed it to be in the range of 0-1 there were no problems

@xzhangxa
Copy link

@SHaiHosh Thanks! I'll try it.

@xyxxyx
Copy link

xyxxyx commented Nov 9, 2017

@SHaiHosh Hi, SHaiHosh. After normalization , have you test the yolov2 of caffe version on any benchmark datasets? How dose it work?

@SHaiHosh
Copy link

SHaiHosh commented Nov 9, 2017

yes i have tested the yolov2 caffe version
no difference than the reported results
then i transferred the network for my purposes so i dont use the original model anymore

@duangenquan
Copy link

Regionlayer is re-implemented in this repo in c/c++, along with a python wrapper. Have fun!

@skyw
Copy link

skyw commented Jan 24, 2018

@SHaiHosh I can convert yolo-voc.weights by the prototxt you shared, thanks.
But how do you test it? How do you calculate mAP? I tried the test_yolo_v2.py but it doesn't seem to give correct boxes. Do you have your own code to run test and calculate mAP? if so, could you share them?

@ysh329
Copy link

ysh329 commented Jan 25, 2018

Convert darknet yolov2 model to caffe · Issue #24 · ysh329/deep-learning-model-convertor
ysh329/deep-learning-model-convertor#24

@appusom
Copy link

appusom commented Feb 6, 2018

@SHaiHosh Thank you for the prototxt, I was able to create the caffemodel using it. Thereafter I modified the test_yolo_v2.py to point to your prototxt and the caffemodel and tried running it. But i am now getting a segmentation fault. Did the script run for you?
I0206 03:38:46.925810 11535 net.cpp:261] This network produces output detection_eval
I0206 03:38:46.925814 11535 net.cpp:261] This network produces output region1
I0206 03:38:46.925858 11535 net.cpp:274] Network initialization done.
1
/usr/local/lib/python2.7/dist-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Segmentation fault (core dumped)

@SHaiHosh
Copy link

SHaiHosh commented Feb 6, 2018

which GPU do you have?
Anyway, tried rebooting your system

@appusom
Copy link

appusom commented Feb 7, 2018

@SHaiHosh I using a CPU only system. Interestingly i tried the train_lenet.sh script and it also met with a core dump. Do you have any idea what might be happening?
F0207 00:48:04.760711 16018 solver.cpp:374] Check failed: result[j]->width() == 5 (1 vs. 5)
*** Check failure stack trace: ***
@ 0x7f4c86c0d5cd google::LogMessage::Fail()
@ 0x7f4c86c0f433 google::LogMessage::SendToLog()
@ 0x7f4c86c0d15b google::LogMessage::Flush()
@ 0x7f4c86c0fe1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f4c87031d94 caffe::Solver<>::Test()
@ 0x7f4c87032ade caffe::Solver<>::TestAll()
@ 0x7f4c87032bfc caffe::Solver<>::Step()
@ 0x7f4c8703377e caffe::Solver<>::Solve()
@ 0x40cf26 train()
@ 0x4081fd main
@ 0x7f4c856c1830 __libc_start_main
@ 0x408a79 _start
@ (nil) (unknown)
Aborted (core dumped)

@dedoogong
Copy link

dedoogong commented Mar 6, 2018

that error occurs because your prototxt doesn't have biases or you get the result before detection output such as conv22.

@SHaiHosh I could run test_yolo_v2.py and test_out_put/test_eval.py with converted model.
I see that in test_yolo_v2.py, author implemented get_region_boxes and nms part on his own in python, and that's why it gets the result from the last conv layer instead of detection out layer.

In test_out_put/test_eval.py, it just interpret the detection results.

But in all cases, I failed to get the correct bboxs(too many boxes with just spread over the image).
The output values of net.forward(), the values look quite weird.
output['detection_out'][0] -> min == 0.0, max == 2.8e+14!!! super big!

but actually in case of the darknet's yolov2(coco-608x608), the min/max values of the outputs of net->predict() are MIN : -2.505956 MAX : 2.043313. This proves the converted model works totally wrongly.

I will keep trace the reason for this wrong result of darknet yolov2...

Did you use your own code? or just modified those code? your hint must be really helpful and save my time aaaa looooot. Thank you very much!!

@dedoogong
Copy link

I found a bug in this repo regarding scale/bias for conv21. I'm trying to fix it. but after fixing it manually(hardcode some values), I coudn't get the good detection results even though better than before(that was really terrible....). I'm seeing a light more and more..... I really wonder how other guys can successfully run this repo withought modification;;;; there are many bugs that must occur to everyone!

@dedoogong
Copy link

dedoogong commented Mar 8, 2018

well, test_yolo_v2.py works finally well but little bit slower than original darknet-yolov2.
caffe-yolov2 takes around 0.07 to 0.08 sec per 1 image (I tested it with person.jpg example image)
darknet-yolov2 takes around 0.06 to 0.07 sec per 1 image.

but with caffe, I can extend yolov2 with other DCNN's much more and optimize more easily! so it's wonderful.

dog_results
eagle_results
girraffe_results
horses_results
person_results

after NMS

dog_results
eagle_results
girraffe_results
horses_results
person_results

@ChriswooTalent
Copy link

Hi:
@dedoogong I have converted the yolo.weight file to the yolo.caffemodel successfully, and I changed the prototxt to the deploy version to test single image, I met the same problem that the yolov2_net Predict too many boxes after nms, how did you solve this problem when you first encountered! Thank you! I need your help!

@ChriswooTalent
Copy link

@dedoogong How did you solve the problem that the Net detected too many boxes?

@ChriswooTalent
Copy link

Finally,I have solved the problem, In the Protofile provided by @SHaiHosh,Firstly,I found a bug that theres is no scale_layer followed the layer "bn21", I added a scale layer and relu layer,;Secondly, the last layer conv23 should not be followed by a relu layer;When you fixed these two bug, use the new protofile to get the yolo.caffemodel by the yolo-voc.weight, then you can use the yolo.caffemodel to do some test, this is my result
yolotestresult

@META-DREAMER
Copy link

@ChriswooTalent Could you share your fixed version of the prototxt file?

@ChriswooTalent
Copy link

@HammadJ Now, I am sorting my code and file, and I will share them on my github this week!

@ghost
Copy link

ghost commented Apr 4, 2018

@ChriswooTalent / @dedoogong it would be nice if you could share the code for those of us who'd just like to try it out. Thanks in advance!!

@bharathbv
Copy link

@ChriswooTalent /@dedoogong , I am trying caffe-yolov2 and not been successful.

  1. After copying @SHaiHosh modified prototxt, I still have issues converting yolo v2 weights to caffemodel.
    examples/indoor/convert# python convert_weights_to_caffemodel.py
    [libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 1015:15: Message type "caffe.LayerParameter" has no field named "reorg_param".

Fix: Using darknet2Caffe.py (https://github.com/marvis/pytorch-caffe-darknet-convert) I got converted prototxt and caffemodel.

name: "yolov2"

layer {
name: "data"
type: "Input"
top: "data"
input_param {
shape {
dim: 1
dim: 3
dim: 416
dim: 416
}
}
}

layer {
name: "layer1_conv"
type: "Convolution"
bottom: "data"
top: "layer1_conv"
convolution_param {
num_output: 32
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer1_bn"
type: "BatchNorm"
bottom: "layer1_conv"
top: "layer1_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer1_scale"
type: "Scale"
bottom: "layer1_conv"
top: "layer1_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer1_act"
type: "ReLU"
bottom: "layer1_conv"
top: "layer1_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer2_maxpool"
type: "Pooling"
bottom: "layer1_conv"
top: "layer2_maxpool"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
}
layer {
name: "layer3_conv"
type: "Convolution"
bottom: "layer2_maxpool"
top: "layer3_conv"
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer3_bn"
type: "BatchNorm"
bottom: "layer3_conv"
top: "layer3_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer3_scale"
type: "Scale"
bottom: "layer3_conv"
top: "layer3_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer3_act"
type: "ReLU"
bottom: "layer3_conv"
top: "layer3_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer4_maxpool"
type: "Pooling"
bottom: "layer3_conv"
top: "layer4_maxpool"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
}
layer {
name: "layer5_conv"
type: "Convolution"
bottom: "layer4_maxpool"
top: "layer5_conv"
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer5_bn"
type: "BatchNorm"
bottom: "layer5_conv"
top: "layer5_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer5_scale"
type: "Scale"
bottom: "layer5_conv"
top: "layer5_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer5_act"
type: "ReLU"
bottom: "layer5_conv"
top: "layer5_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer6_conv"
type: "Convolution"
bottom: "layer5_conv"
top: "layer6_conv"
convolution_param {
num_output: 64
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer6_bn"
type: "BatchNorm"
bottom: "layer6_conv"
top: "layer6_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer6_scale"
type: "Scale"
bottom: "layer6_conv"
top: "layer6_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer6_act"
type: "ReLU"
bottom: "layer6_conv"
top: "layer6_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer7_conv"
type: "Convolution"
bottom: "layer6_conv"
top: "layer7_conv"
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer7_bn"
type: "BatchNorm"
bottom: "layer7_conv"
top: "layer7_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer7_scale"
type: "Scale"
bottom: "layer7_conv"
top: "layer7_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer7_act"
type: "ReLU"
bottom: "layer7_conv"
top: "layer7_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer8_maxpool"
type: "Pooling"
bottom: "layer7_conv"
top: "layer8_maxpool"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
}
layer {
name: "layer9_conv"
type: "Convolution"
bottom: "layer8_maxpool"
top: "layer9_conv"
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer9_bn"
type: "BatchNorm"
bottom: "layer9_conv"
top: "layer9_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer9_scale"
type: "Scale"
bottom: "layer9_conv"
top: "layer9_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer9_act"
type: "ReLU"
bottom: "layer9_conv"
top: "layer9_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer10_conv"
type: "Convolution"
bottom: "layer9_conv"
top: "layer10_conv"
convolution_param {
num_output: 128
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer10_bn"
type: "BatchNorm"
bottom: "layer10_conv"
top: "layer10_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer10_scale"
type: "Scale"
bottom: "layer10_conv"
top: "layer10_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer10_act"
type: "ReLU"
bottom: "layer10_conv"
top: "layer10_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer11_conv"
type: "Convolution"
bottom: "layer10_conv"
top: "layer11_conv"
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer11_bn"
type: "BatchNorm"
bottom: "layer11_conv"
top: "layer11_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer11_scale"
type: "Scale"
bottom: "layer11_conv"
top: "layer11_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer11_act"
type: "ReLU"
bottom: "layer11_conv"
top: "layer11_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer12_maxpool"
type: "Pooling"
bottom: "layer11_conv"
top: "layer12_maxpool"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
}
layer {
name: "layer13_conv"
type: "Convolution"
bottom: "layer12_maxpool"
top: "layer13_conv"
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer13_bn"
type: "BatchNorm"
bottom: "layer13_conv"
top: "layer13_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer13_scale"
type: "Scale"
bottom: "layer13_conv"
top: "layer13_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer13_act"
type: "ReLU"
bottom: "layer13_conv"
top: "layer13_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer14_conv"
type: "Convolution"
bottom: "layer13_conv"
top: "layer14_conv"
convolution_param {
num_output: 256
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer14_bn"
type: "BatchNorm"
bottom: "layer14_conv"
top: "layer14_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer14_scale"
type: "Scale"
bottom: "layer14_conv"
top: "layer14_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer14_act"
type: "ReLU"
bottom: "layer14_conv"
top: "layer14_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer15_conv"
type: "Convolution"
bottom: "layer14_conv"
top: "layer15_conv"
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer15_bn"
type: "BatchNorm"
bottom: "layer15_conv"
top: "layer15_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer15_scale"
type: "Scale"
bottom: "layer15_conv"
top: "layer15_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer15_act"
type: "ReLU"
bottom: "layer15_conv"
top: "layer15_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer16_conv"
type: "Convolution"
bottom: "layer15_conv"
top: "layer16_conv"
convolution_param {
num_output: 256
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer16_bn"
type: "BatchNorm"
bottom: "layer16_conv"
top: "layer16_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer16_scale"
type: "Scale"
bottom: "layer16_conv"
top: "layer16_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer16_act"
type: "ReLU"
bottom: "layer16_conv"
top: "layer16_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer17_conv"
type: "Convolution"
bottom: "layer16_conv"
top: "layer17_conv"
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer17_bn"
type: "BatchNorm"
bottom: "layer17_conv"
top: "layer17_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer17_scale"
type: "Scale"
bottom: "layer17_conv"
top: "layer17_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer17_act"
type: "ReLU"
bottom: "layer17_conv"
top: "layer17_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer18_maxpool"
type: "Pooling"
bottom: "layer17_conv"
top: "layer18_maxpool"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
}
layer {
name: "layer19_conv"
type: "Convolution"
bottom: "layer18_maxpool"
top: "layer19_conv"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer19_bn"
type: "BatchNorm"
bottom: "layer19_conv"
top: "layer19_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer19_scale"
type: "Scale"
bottom: "layer19_conv"
top: "layer19_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer19_act"
type: "ReLU"
bottom: "layer19_conv"
top: "layer19_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer20_conv"
type: "Convolution"
bottom: "layer19_conv"
top: "layer20_conv"
convolution_param {
num_output: 512
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer20_bn"
type: "BatchNorm"
bottom: "layer20_conv"
top: "layer20_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer20_scale"
type: "Scale"
bottom: "layer20_conv"
top: "layer20_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer20_act"
type: "ReLU"
bottom: "layer20_conv"
top: "layer20_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer21_conv"
type: "Convolution"
bottom: "layer20_conv"
top: "layer21_conv"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer21_bn"
type: "BatchNorm"
bottom: "layer21_conv"
top: "layer21_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer21_scale"
type: "Scale"
bottom: "layer21_conv"
top: "layer21_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer21_act"
type: "ReLU"
bottom: "layer21_conv"
top: "layer21_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer22_conv"
type: "Convolution"
bottom: "layer21_conv"
top: "layer22_conv"
convolution_param {
num_output: 512
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer22_bn"
type: "BatchNorm"
bottom: "layer22_conv"
top: "layer22_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer22_scale"
type: "Scale"
bottom: "layer22_conv"
top: "layer22_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer22_act"
type: "ReLU"
bottom: "layer22_conv"
top: "layer22_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer23_conv"
type: "Convolution"
bottom: "layer22_conv"
top: "layer23_conv"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer23_bn"
type: "BatchNorm"
bottom: "layer23_conv"
top: "layer23_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer23_scale"
type: "Scale"
bottom: "layer23_conv"
top: "layer23_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer23_act"
type: "ReLU"
bottom: "layer23_conv"
top: "layer23_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer24_conv"
type: "Convolution"
bottom: "layer23_conv"
top: "layer24_conv"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer24_bn"
type: "BatchNorm"
bottom: "layer24_conv"
top: "layer24_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer24_scale"
type: "Scale"
bottom: "layer24_conv"
top: "layer24_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer24_act"
type: "ReLU"
bottom: "layer24_conv"
top: "layer24_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer25_conv"
type: "Convolution"
bottom: "layer24_conv"
top: "layer25_conv"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer25_bn"
type: "BatchNorm"
bottom: "layer25_conv"
top: "layer25_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer25_scale"
type: "Scale"
bottom: "layer25_conv"
top: "layer25_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer25_act"
type: "ReLU"
bottom: "layer25_conv"
top: "layer25_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer27_conv"
type: "Convolution"
bottom: "layer17_conv"
top: "layer27_conv"
convolution_param {
num_output: 64
kernel_size: 1
pad: 0
stride: 1
bias_term: false
}
}
layer {
name: "layer27_bn"
type: "BatchNorm"
bottom: "layer27_conv"
top: "layer27_conv"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "layer27_scale"
type: "Scale"
bottom: "layer27_conv"
top: "layer27_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer27_act"
type: "ReLU"
bottom: "layer27_conv"
top: "layer27_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer28_reorg"
type: "Reshape"
bottom: "layer27_conv"
top: "layer28_reorg"
reshape_param {
shape {
dim: 1
dim: 256
dim: 13
dim: 13
}
}
}
layer {
name: "layer29_concat"
type: "Concat"
bottom: "layer28_reorg"
bottom: "layer25_conv"
top: "layer29_concat"
}
layer {
name: "layer30_conv"
type: "Convolution"
bottom: "layer29_concat"
top: "layer30_conv"
convolution_param {
num_output: 1024
kernel_size: 3
pad: 1
stride: 1
bias_term: false
}
}
layer {
name: "layer30_bn"
type: "BatchNorm"
bottom: "layer30_conv"
top: "layer30_conv"
batch_norm_param {
use_global_stats: true
}layer31_conv
}
layer {
name: "layer30_scale"
type: "Scale"
bottom: "layer30_conv"
top: "layer30_conv"
scale_param {
bias_term: true
}
}
layer {
name: "layer30_act"
type: "ReLU"
bottom: "layer30_conv"
top: "layer30_conv"
relu_param {
negative_slope: 0.1
}
}
layer {
name: "layer31_conv"
type: "Convolution"
bottom: "layer30_conv"
top: "layer31_conv"
convolution_param {
num_output: 425
kernel_size: 1
pad: 0
stride: 1
bias_term: true
}
}
layer {
name: "layer32_region"
type: "Region"
bottom: "layer31_conv"
top: "layer32_region"
region_param {
anchors: "0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828"
classes: 80
bias_match: 1
coords: 4
num: 5
softmax: 1
jitter: .3
rescore: 1
object_scale: 5
noobject_scale: 1
class_scale: 1
coord_scale: 1
absolute: 1
thresh: .6
random: 1
nms_thresh: 0.3
background: 0
tree_thresh: 0.5
relative: 1
box_thresh: 0.24
}
}
2. Since the "Region layer is implemented in python code, I removed the last "Region" layer in the above prototxt and modified test_eval.py to make "layer31_conv" as output layer and tried
python test_eval.py ~/yolo_images/dog.jpg. The results are just bboxes all around. I am guessing its similar issue described by @dedoogong .

Appreciate any guidance on this.

@imbadh
Copy link

imbadh commented Apr 24, 2018

@SHaiHosh I tried your prototxt. But there is still a mistake.
Which shows :
conv23 (125, 1024, 1, 1) (125,)
count= 50675955
transFlag = False
(50983561,)
conv1(conv)
bn1(batchnorm)
scale1(scale)
conv2(conv)
bn2(batchnorm)
scale2(scale)
conv3(conv)
bn3(batchnorm)
scale3(scale)
conv4(conv)
bn4(batchnorm)
scale4(scale)
conv5(conv)
bn5(batchnorm)
scale5(scale)
conv6(conv)
bn6(batchnorm)
scale6(scale)
conv7(conv)
bn7(batchnorm)
scale7(scale)
conv8(conv)
bn8(batchnorm)
scale8(scale)
conv9(conv)
bn9(batchnorm)
scale9(scale)
conv10(conv)
bn10(batchnorm)
scale10(scale)
conv11(conv)
bn11(batchnorm)
scale11(scale)
conv12(conv)
bn12(batchnorm)
scale12(scale)
conv13(conv)
bn13(batchnorm)
scale13(scale)
conv14(conv)
bn14(batchnorm)
scale14(scale)
conv15(conv)
bn15(batchnorm)
scale15(scale)
conv16(conv)
bn16(batchnorm)
scale16(scale)
conv17(conv)
bn17(batchnorm)
scale17(scale)
conv18(conv)
bn18(batchnorm)
scale18(scale)
conv19(conv)
bn19(batchnorm)
scale19(scale)
conv20(conv)
bn20(batchnorm)
scale20(scale)
conv21(conv)
bn21(batchnorm)
conv22(conv)
bn22(batchnorm)
scale22(scale)
conv23(conv)
ERROR: size mismatch: 50676061

Do you know what's wrong?
Or May I have your yolov2 wights file?

@VivekMaran27
Copy link

VivekMaran27 commented Jul 27, 2018

@imbadh : Can you please let me know on how did you fix the issue of
size mismatch: 50676061

Update: I was trying to convert yolov2-voc.weights which was giving the error. When I used http://pjreddie.com/media/files/yolo-voc.weightshttp://pjreddie.com/media/files/yolo-voc.weights. Conversion worked okay

@VivekMaran27
Copy link

VivekMaran27 commented Jul 27, 2018

@appusom May you please let me know how did you resolve the issue of

I0206 03:38:46.925810 11535 net.cpp:261] This network produces output detection_eval
I0206 03:38:46.925814 11535 net.cpp:261] This network produces output region1
I0206 03:38:46.925858 11535 net.cpp:274] Network initialization done.
1
/usr/local/lib/python2.7/dist-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Segmentation fault (core dumped)

When I did backtrace it seems to crash in get_region_boxes function I believe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests