-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple GPU support #26
Comments
For your reference, I train for whole dataset about 5 hours per pooch in Nvidia GTX980. |
Thanks @jamesweb1 for that. I am more or less on same timelines ! Do you think that using multiple GPUs would help us in bringing the training time down, considerably. Right now, with 50 epochs per training and 5-6 hours per epoch, its taking too much time (~300 hrs) making it difficult to try out and find the best parameters (batchSize,hidden layer size, dataset size) for the task. Do you have any recommendations here? |
Yes, It takes a lot of time. So that I try to train the small subset only(perhaps dataset = 20000). In these experiments, I can obtain the better parameters, and then extend to the whole dataset. I'd like to train on multiple GPUs, but I don't have another resources now. :( |
Could we not pool experimentation and report results somewhere, as to avoid double work? |
That would be great, did anybody start some statistics/benchmarks already? |
I am also looking at adding multiple GPU support, has anyone had any progress yet? |
@svenwoldt that's a great idea! However, we don't have a good metric for measuring the quality of the model yet. #38 adds a validation set, maybe adding a tests set would be the best way to do this? RE multiple GPU support. I'm not sure how this could be done and I only have 1 GPU at my disposal. So I'd need help on this :) |
Did anyone make any progress on this? I'm also looking for a multi-GPU solution. |
Hi Friends,
Do we have the support to run training on multiple GPUs to save time? I am having a machine with 4 GPUs but looks like only one of the GPU is being utilised.
Also, when i am trying to train over complete dataset using the following command, it is taking around 9 hours per epoch. Is this time expected or am i doing something wrong here?
th train.lua --cuda --dataset 0 --hiddenSize 1000
Thanks
The text was updated successfully, but these errors were encountered: