CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

whria78 · 2019-05-30T08:13:06Z

Hello, Thank you for the maintaining the nvcaffe. I have a problem when I use nvcaffe with SE-Net models.

CPU usage 100% when finetuning SE-Net (or all SE-NeXt) with nvcaffe 0.17.3

There was no problem when I fine-tuning SE-Net with old nvcaffe 0.17.2.
However, If I trained with the new nvcaffe 0.17.3, the CPU utilizations of all cores burn to 100% after initial test 0,1,2 iterations.
I tested on 2 systems and I got the same results (Intel skylake, Ubuntu 16.04 , CUDA 10.1 , cudnn 10.1 , driver=nvidia-driver-418)

For this problem, I could not train SENet, SE-ReNext-50, SE-ResNext-100. However, there was no problem in training VGG model with the new nvcaffe 0.17.3

drnikolaev · 2019-06-18T00:13:44Z

Hi @whria78 could you attach complete log here please?

whria78 · 2019-06-18T01:05:43Z

Thank you for the reply.

I attached log and trainval.prototxt

senext50_FP16_train.log

senext50_FP16.zip

drnikolaev · 2019-06-18T05:43:13Z

@whria78 thanks.

I0618 10:02:14.258036 32105 net.cpp:1135] Ignoring source layer classifier
I0618 10:02:14.258038 32105 net.cpp:1135] Ignoring source layer classifier_classifier_0_split

Is this your custom layer? CPU only? If so, you might need to consider to add GPU implementaion to prevent CPU overload.

whria78 · 2019-06-18T11:18:39Z

Thank you @drnikolaev

I tried to fine-tuning SE-ResNeXt-50. I changed the last classify layer as followings,

From (https://github.com/hujie-frank/SENet ; class = 1000)

layer {
  name: "classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "classifier"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "classifier"
  top: "prob"
}

To (class = 178)

layer {
  name: "whria_classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "whria_classifier"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 178
  }
}
layer {
	bottom: "whria_classifier"
	bottom: "label"
	top: "loss"
	name: "loss"
	type: "SoftmaxWithLoss"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "whria_classifier"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}

To my knowledge, this modification is commonly used for fine-tuning.

I think it does not make all 8 CPU cores to 100%.

This phenomenon does not occur if I changed to use the old version NVCaffe 0.17.2.

Another fact is that I had no such a problem when I finetuned VGG-19 with the new NVCaffe 0.17.3.

whria78 · 2019-06-18T11:39:35Z

As I have observed. the cpu overload problem occured around "batch_transformer.cpp:51] Started BatchTransformer thread 32123"

Before "batch_transformer.cpp:51] Started BatchTransformer thread 32123", there is "Waiting for datum".

I guess at that stage, the CPU overload already started, which result in "waiting for datum".

The CPU overload problem occur before the Iteration 1.

I0618 10:02:14.725428 32119 internal_thread.cpp:78] Started internal thread 32119 on device 0, rank 0
I0618 10:02:14.725850 32121 internal_thread.cpp:78] Started internal thread 32121 on device 0, rank 0
I0618 10:02:14.726251 32120 internal_thread.cpp:78] Started internal thread 32120 on device 0, rank 0
I0618 10:02:14.731768 32122 internal_thread.cpp:78] Started internal thread 32122 on device 0, rank 0
I0618 10:02:14.733804 32121 blocking_queue.cpp:40] Waiting for datum
I0618 10:02:14.739902 32118 common.cpp:544] {0} NVML succeeded to set CPU affinity
I0618 10:02:14.741881 32123 common.cpp:544] {0} NVML succeeded to set CPU affinity
I0618 10:02:14.741904 32123 batch_transformer.cpp:51] Started BatchTransformer thread 32123
I0618 10:02:26.103688 32105 solver.cpp:342] [0.0] Iteration 1 (11.3895 s), loss = 5.17969
I0618 10:02:26.103915 32105 solver.cpp:358] [0.0] Train net output #0: loss = 5.17969 ( 1 = 5.17969 loss)
I0618 10:02:26.103986 32105 sgd_solver.cpp:180] [0.0] Iteration 0, lr = 0.001, m = 0.9, lrm = 0.01, wd = 1e-05, gs = 1*

drnikolaev · 2019-07-29T07:46:31Z

"waiting for datum" means that data reader (CPU-based) delivers data not fast enough for the solver. Therefore, it uses CPU up to 100%

whria78 changed the title ~~Slow when deploying SE-NeXt-100~~ CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 May 30, 2019

drnikolaev closed this as completed Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

whria78 commented May 30, 2019 •

edited

Loading

drnikolaev commented Jun 18, 2019

whria78 commented Jun 18, 2019

drnikolaev commented Jun 18, 2019

whria78 commented Jun 18, 2019 •

edited

Loading

whria78 commented Jun 18, 2019 •

edited

Loading

drnikolaev commented Jul 29, 2019

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

Comments

whria78 commented May 30, 2019 • edited Loading

CPU usage 100% when finetuning SE-Net (or all SE-NeXt) with nvcaffe 0.17.3

drnikolaev commented Jun 18, 2019

whria78 commented Jun 18, 2019

drnikolaev commented Jun 18, 2019

whria78 commented Jun 18, 2019 • edited Loading

whria78 commented Jun 18, 2019 • edited Loading

drnikolaev commented Jul 29, 2019

whria78 commented May 30, 2019 •

edited

Loading

whria78 commented Jun 18, 2019 •

edited

Loading

whria78 commented Jun 18, 2019 •

edited

Loading