Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

Closed
whria78 opened this issue May 30, 2019 · 6 comments
Closed

CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 #567

whria78 opened this issue May 30, 2019 · 6 comments

Comments

@whria78
Copy link

whria78 commented May 30, 2019

Hello, Thank you for the maintaining the nvcaffe. I have a problem when I use nvcaffe with SE-Net models.

CPU usage 100% when finetuning SE-Net (or all SE-NeXt) with nvcaffe 0.17.3

There was no problem when I fine-tuning SE-Net with old nvcaffe 0.17.2.
However, If I trained with the new nvcaffe 0.17.3, the CPU utilizations of all cores burn to 100% after initial test 0,1,2 iterations.
I tested on 2 systems and I got the same results (Intel skylake, Ubuntu 16.04 , CUDA 10.1 , cudnn 10.1 , driver=nvidia-driver-418)

For this problem, I could not train SENet, SE-ReNext-50, SE-ResNext-100. However, there was no problem in training VGG model with the new nvcaffe 0.17.3

@whria78 whria78 changed the title Slow when deploying SE-NeXt-100 CPU 100% when finetuning SE-Net with nvcaffe 0.17.3 May 30, 2019
@drnikolaev
Copy link

Hi @whria78 could you attach complete log here please?

@whria78
Copy link
Author

whria78 commented Jun 18, 2019

Thank you for the reply.

I attached log and trainval.prototxt

senext50_FP16_train.log

senext50_FP16.zip

@drnikolaev
Copy link

@whria78 thanks.

I0618 10:02:14.258036 32105 net.cpp:1135] Ignoring source layer classifier
I0618 10:02:14.258038 32105 net.cpp:1135] Ignoring source layer classifier_classifier_0_split

Is this your custom layer? CPU only? If so, you might need to consider to add GPU implementaion to prevent CPU overload.

@whria78
Copy link
Author

whria78 commented Jun 18, 2019

Thank you @drnikolaev

I tried to fine-tuning SE-ResNeXt-50. I changed the last classify layer as followings,

From (https://github.com/hujie-frank/SENet ; class = 1000)

layer {
  name: "classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "classifier"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "classifier"
  top: "prob"
}

To (class = 178)

layer {
  name: "whria_classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "whria_classifier"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 178
  }
}
layer {
	bottom: "whria_classifier"
	bottom: "label"
	top: "loss"
	name: "loss"
	type: "SoftmaxWithLoss"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "whria_classifier"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}

To my knowledge, this modification is commonly used for fine-tuning.

I think it does not make all 8 CPU cores to 100%.

This phenomenon does not occur if I changed to use the old version NVCaffe 0.17.2.

Another fact is that I had no such a problem when I finetuned VGG-19 with the new NVCaffe 0.17.3.

@whria78
Copy link
Author

whria78 commented Jun 18, 2019

As I have observed. the cpu overload problem occured around "batch_transformer.cpp:51] Started BatchTransformer thread 32123"

Before "batch_transformer.cpp:51] Started BatchTransformer thread 32123", there is "Waiting for datum".

I guess at that stage, the CPU overload already started, which result in "waiting for datum".

The CPU overload problem occur before the Iteration 1.

I0618 10:02:14.725428 32119 internal_thread.cpp:78] Started internal thread 32119 on device 0, rank 0
I0618 10:02:14.725850 32121 internal_thread.cpp:78] Started internal thread 32121 on device 0, rank 0
I0618 10:02:14.726251 32120 internal_thread.cpp:78] Started internal thread 32120 on device 0, rank 0
I0618 10:02:14.731768 32122 internal_thread.cpp:78] Started internal thread 32122 on device 0, rank 0
I0618 10:02:14.733804 32121 blocking_queue.cpp:40] Waiting for datum
I0618 10:02:14.739902 32118 common.cpp:544] {0} NVML succeeded to set CPU affinity
I0618 10:02:14.741881 32123 common.cpp:544] {0} NVML succeeded to set CPU affinity
I0618 10:02:14.741904 32123 batch_transformer.cpp:51] Started BatchTransformer thread 32123
I0618 10:02:26.103688 32105 solver.cpp:342] [0.0] Iteration 1 (11.3895 s), loss = 5.17969
I0618 10:02:26.103915 32105 solver.cpp:358] [0.0] Train net output #0: loss = 5.17969 (
1 = 5.17969 loss)
I0618 10:02:26.103986 32105 sgd_solver.cpp:180] [0.0] Iteration 0, lr = 0.001, m = 0.9, lrm = 0.01, wd = 1e-05, gs = 1
*

@drnikolaev
Copy link

"waiting for datum" means that data reader (CPU-based) delivers data not fast enough for the solver. Therefore, it uses CPU up to 100%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants