Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on GPU does not utilise GPU properly #836

Closed
mhmoudr opened this issue Aug 15, 2017 · 8 comments
Closed

Training on GPU does not utilise GPU properly #836

mhmoudr opened this issue Aug 15, 2017 · 8 comments

Comments

@mhmoudr
Copy link

mhmoudr commented Aug 15, 2017

Mainly the issue is GPU utilisation. as after building LightGBM for GPU up to the described process, and running on a sample dataset, while monitoring both CPU and GPU (please check attached screen shot)
The process move the training dataset into GPU memory and nvidia-smi recognise it as a running process, while during the training GPU utilisation does not exceed 2-5%, on the other hand CPU seems to be fully utilised.

I am not sure if this is a defect of a kind of incomplete implementation.

Environment info

Operating System: Ubuntu 16
CPU: 2 Xeon (total 48 Cores )
C++ version: latest C++ and calling the CLI process

Error Message:

screenshot from 2017-08-16 09-38-27

N/A

Steps to reproduce

Compile for GPU using the provided docs.
use the following config :
data = "/path/to/libsvm/file"
num_iterations = 3000
learning_rate = 0.01
max_depth = 12
device = gpu
gpu_platform_id = 0
gpu_device_id = 0

@fenqingr
Copy link

same thing here, most of time GPU usage stays at 0% with some spike ~10% to 20%. I though my CPU( i5 2500K @ 4G) caused bottle necking my GPU performance and from your case I believe it is not the case.

@guolinke
Copy link
Collaborator

It is normal if your training data is small.
BTW, you are using max_depth = 12, but forget to set num_leaves, so you are training the small model, which will also cause the low gpu usage.

@mhmoudr
Copy link
Author

mhmoudr commented Aug 16, 2017

I am using a dataset that have ~14Mil row and ~1000 sparse features the memory foot print as you can notice in the nvidia-smi was 649MB, and in term of the num_levels I have set it to 12 as well and I had exactly the same behavior

@guolinke
Copy link
Collaborator

@mhmoudr num_leaves=2^max_depth.
@huanzhang12 I remember the sparse feature cannot use GPU to speed up, right ?

@mhmoudr
Copy link
Author

mhmoudr commented Aug 16, 2017

As I am writing this I am running using num_levels = 24 (as depth still on 12) but GPU utilization is closer to 1% this time.
The question that I have why CPU in this scenario are utilized with something close to 80%, doesn't this mean that all of the heavy lifting (calculation) that supposed to happen on GPU cores is happening actually on CPU? as in other packages when I train on GPU, CPUs seems to be idle (nearly doing nothing)

@guolinke
Copy link
Collaborator

@mhmoudr refer to #768 .
The LightGBM GPU still need to use CPU to do some calculations.
And all sparse features cannot use GPU to speed up, so the CPU usage is high in your case.

@huanzhang12
Copy link
Contributor

@mhmoudr Sparse features processing has too much irregularity and are currently not accelerated on GPU. Try to set sparse_threshold parameter to 1.0 or a number very close to 1.0 and see if there are any improvement. See more details in the GPU performance tuning guide.

@Noxoomo
Copy link

Noxoomo commented Oct 3, 2017

I had the same problem. Changing num_threads option from cpu_cores (in my case 16) to and revert to 6cc1dd9 revision 1 solved the problem. BTW, it's bug and should be fixed.

UPD:
I have 2-socket server with 2x8 thread intel CPU. 32 threads cause significant slowdown compared to 16 threads for gpu-based learning.
LightGBM from current trunk works slower than 6cc1dd9 revision

@lock lock bot locked as resolved and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants