As my repo must run in industry embedded devices which has poor computer sources, so I have to compress and accelerate them step by step untill the inference time fit our boss's command :(
Backbone net of my project is yolov3-lite and optimise version.
In the process of creating my project, I have referenced some git projects and papers in cvpr, thanks to these guys.
I will continue to update afterwards, please stay tuned.
All accelerate switches can be found in MakeFile
Set OPENMP := 1 in Makefile
If you know multiple threads run in arm of X86 chips, you must know Openmp
Next picture is how Openmp runs. It has many tricks to ensure work well between threads.
The result of use openmp
in project is:
Set MASK := 1 in Makefile
It a regular method to decrease the computation of conv layers. But the key point is how to set which kernel is important and which kernel need to delete.
In this project, I referenced the paper of
Accelerating Convolutional Networks via Global & Dynamic Filter Pruning
product of Tencent lab
The accelerating result of use kernel mask
in project is:
Set PRUNE := 1 in Makefile
Because this method is very simple, you just need to set weights < threshold to 0, so I don't need to introduce it anymore.
The accelerating result of use < kernel mask & weights prune >
in project is:
L1 Regularization can be regard to another way to decrease kernels, the principle is like kernel decrease with BN parameters in other papers.
Yolo use L2 regularization as default, so you need to change it to L1 in code. This method has a disadvantage, you need to change cfg files after every epoch end (after one epoch train you know how many kernels to leave in every conv layer) k If you want to know more about L2 and L1 regularization in yolo, you can go to my blog
The accelerating result of use L1 Regulatization
in project is:
In the domain of network acceleration, Quantization is always the most important trick. I have realized two quantization type, which can be switched in Makefile.
Set QUANTIZATION := 1 in Makefile
This module were imported from AlexeyAB's github repo
As he introduced, this quantization method is referenced nvidia's TensorRT theory.
But when I test this module, it works not good, recently I added google's quantization method code to it.
Set QUANTIZATION_GOOGLE := 1 in Makefile
Paper: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [C]// CVPR, 2018
The most novelty idea is plug in Fake Quantization
in train process. And you can get the input quantization scale directly after model training instead of run calibrate process in calibration dataset.
And for the purpose of implemente the project to embedded devices, I added gemm_lowp of google to darknet.
The key point of Mobilenet, it has been merged in yolov3 by the author, I optimized the code so that l.groups
can be used in every module.
1. analysis your original net, decide which module you need to use
2. change makefile and open modules, for example, if you want to use image mask, you just need to set
1. start train
./darknet detector train [data_file path] cfg/yolov3.cfg [pretrain weights file]
4. start test
set 'GPU=0'
./darknet detector test [data_file path] cfg/yolov3.cfg [weights file] [image file to detect]
I have pretrained a model in backup, you can have a try :)
1. analysis your original net, decide which module you need to use
2. change makefile and open modules, for example, if you want to use image mask, you just need to set
3. normal test
./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg
4. test with nvidia quantization
1). set QUANTIZATION := 1
2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg -quantized
5. test with google quantization
2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg
1. I added F1 score test code, the command is :
./darknet detector f1 [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup
1. I also have some other modules such as `Hash Compress` `Huffman Compress`, but I can't give all of them to you with other
1. When I test all the method in tiny net(not in VGG), it can decrease inference time by 30%~50% with very little f1 decrease,
and if you want faster, use quantization, it will surprise you!!!!!
If you want to use my code, please let me know!!!!