Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi gpu parallel computing #16

Open
ysm022 opened this issue Jun 12, 2020 · 4 comments
Open

multi gpu parallel computing #16

ysm022 opened this issue Jun 12, 2020 · 4 comments

Comments

@ysm022
Copy link

ysm022 commented Jun 12, 2020

Hello, I can run train.py with very little dataset. 6 pics as train input, including 3 real and 3 fake. 2 pics as val. But when I use big dataset to train, there are total number of data: 11071 | pos: 5705, neg: 5366, total number of data: 1231 | pos: 634, neg: 597.
I get error as follow:


Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 1.158715GB memory on GPU 0, available memory is only 199.500000MB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)

I use a nvidia 1080ti to train, the memory is about 11G. The error is ResourceExhaustedError.
I have 4 pieces 1080ti. So how can I do multi gpu parallel computing?
Thank you!

@ZGSLZL
Copy link
Contributor

ZGSLZL commented Jun 12, 2020

Hi, @ysm022 , firstly you can reduce your batch size and train on single GPU, secondly, if you want to use multi-GPUs training, you need to modify the following configuration:

  1. multi_gpus=True in Runner()
  2. modify code in train.py:
    place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
    with fluid.dygraph.guard(place):
  3. python -m paddle.distributed.launch train.py
    Please refer to paddle dygraph for more details

@CoinCheung
Copy link

Hi,
Is the gradient of weights computed in each gpu averaged of summed up as the reduce operation ?
If I use multi-gpu training mode, do I need to scale up the learning rate according to the linear rule ?
By the way, would you please tell what is the ratio of positive/negative in the original dataset, will it be ok if our own dataset is significantly imbalanced( with positive:negative = 1:5)?

@silvercherry
Copy link

Hi,
Is the gradient of weights computed in each gpu averaged of summed up as the reduce operation ?
If I use multi-gpu training mode, do I need to scale up the learning rate according to the linear rule ?
By the way, would you please tell what is the ratio of positive/negative in the original dataset, will it be ok if our own dataset is significantly imbalanced( with positive:negative = 1:5)?

hi have you solve this problem?
I also want to use muti-gpu to train
and fix multi_gpus=True use python -m paddle.distributed.launch train.py
but it can not use
and

Error Message Summary:

Error: Place CUDAPlace(0) is not supported, Please check that your paddle compiles with WITH_GPU option or check that your train process hold the correct gpu_id if you use Executor at (/paddle/paddle/fluid/platform/device_context.cc:67)

W0622 13:17:57.689676 13321 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.0, Runtime API Version: 9.0
W0622 13:17:57.693465 13321 device_context.cc:245] device: 0, cuDNN Version: 7.6.
2020-06-22 13:17:58,544-INFO: Loading pretrained model from ./pretrained/resnet18-torch
2020-06-22 13:17:58,728-ERROR: ABORT!!! Out of all 4 trainers, the trainer process with rank=[1, 2, 3] was aborted. Please check its log.
ERROR 2020-06-22 13:17:58,728 launch.py:284] ABORT!!! Out of all 4 trainers, the trainer process with rank=[1, 2, 3] was aborted. Please check its log.
W0622 13:17:58.734313 13321 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly

@Lucien7786
Copy link

I run to paddle github searching for multi-gpu sample. I have tested this mnist-project link code:
PaddlePaddle/Paddle#18205 (comment)
with export CUDA_VISIBLE_DEVICES=0,1,2,3; python test.py
and it works with 4 gpus. here is the log.

W0623 22:59:04.695741 9690 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 9.0
W0623 22:59:04.766938 9690 device_context.cc:245] device: 0, cuDNN Version: 7.6.
I0623 22:59:07.364761 9690 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 4 cards are used, so 4 programs are executed in parallel.
W0623 22:59:16.499653 9690 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 8. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 4.
I0623 22:59:16.500174 9690 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0623 22:59:16.506099 9690 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0623 22:59:16.509552 9690 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
######Pass 0, Epoch 0, Cost [array([4.4588094, 5.480756 , 5.1428175, 5.9708376], dtype=float32), array([0.0625, 0.0625, 0.0625, 0. ], dtype=float32)]
######Pass 100, Epoch 0, Cost [array([0.02101828, 0.1083214 , 0.23341596, 0.61708015], dtype=float32), array([1. , 1. , 0.9375, 0.8125], dtype=float32)]
######Pass 200, Epoch 0, Cost [array([0.35945463, 0.30678317, 0.19145483, 0.24440879], dtype=float32), array([0.875 , 0.9375, 0.875 , 0.9375], dtype=float32)]
######Pass 300, Epoch 0, Cost [array([0.01953058, 0.14692771, 0.09672394, 0.47347093], dtype=float32), array([1. , 0.9375, 0.9375, 0.9375], dtype=float32)]
######Pass 400, Epoch 0, Cost [array([0.00153226, 0.05326104, 0.04533184, 0.03504385], dtype=float32), array([1., 1., 1., 1.], dtype=float32)]
######Pass 500, Epoch 0, Cost [array([0.28838432, 0.21555214, 0.03272529, 0.32204518], dtype=float32), array([0.9375, 0.9375, 1. , 0.875 ], dtype=float32)]
######Pass 600, Epoch 0, Cost [array([0.12778899, 0.01811488, 0.01242642, 0.16692397], dtype=float32), array([0.9375, 1. , 1. , 0.9375], dtype=float32)]
######Pass 700, Epoch 0, Cost [array([0.10428553, 0.05949949, 0.02604522, 0.00989265], dtype=float32), array([0.9375, 1. , 1. , 1. ], dtype=float32)]
######Pass 800, Epoch 0, Cost [array([0.4574466 , 0.0150936 , 0.00482975, 0.04338158], dtype=float32), array([0.875, 1. , 1. , 1. ], dtype=float32)]
######Pass 900, Epoch 0, Cost [array([0.01204062, 0.01237518, 0.02565481, 0.47837988], dtype=float32), array([1. , 1. , 1. , 0.9375], dtype=float32)]
Test with Epoch 0, avg_cost: 0.08308441504291267, acc: 0.9728304140127388

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2407 C ./darknet 3203MiB |
| 0 9690 C ...iniconda3/envs/paddle-gpu/bin/python3.7 365MiB |
| 1 2407 C ./darknet 3203MiB |
| 1 9690 C ...iniconda3/envs/paddle-gpu/bin/python3.7 359MiB |
| 2 2407 C ./darknet 3203MiB |
| 2 9690 C ...iniconda3/envs/paddle-gpu/bin/python3.7 359MiB |
| 3 2407 C ./darknet 3203MiB |
| 3 9690 C ...iniconda3/envs/paddle-gpu/bin/python3.7 33MiB |
+-----------------------------------------------------------------------------+

so i guess there must be something setting wrong with multi-gpu mode, could u please check and give some tips for us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants