You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i try to train my dataset on a single gpu GTX 1060 6GB,and it break out out of memory aloways at third epoch, if you have any suggestion about how to fix it, very grateful.
2018-06-21 08:53:49.853249: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats:
Limit: 5856854016
InUse: 5832717824
MaxInUse: 5845060608
NumAllocs: 2163
MaxAllocSize: 1121255424
2018-06-21 08:53:49.853344: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2018-06-21 08:53:49.853378: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[2,50,50,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
return fn(*args)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
status, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 265, in
train(args)
File "train.py", line 213, in train
sess_ret = sess.run(sess2run, feed_dict=feed_dict)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
The text was updated successfully, but these errors were encountered:
How are you getting tower 4 when you're only running with one GPU?
Can you post your train.py command? Or are you making changes in your training file?
i try to train my dataset on a single gpu GTX 1060 6GB,and it break out out of memory aloways at third epoch, if you have any suggestion about how to fix it, very grateful.
2018-06-21 08:53:49.853249: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats:
Limit: 5856854016
InUse: 5832717824
MaxInUse: 5845060608
NumAllocs: 2163
MaxAllocSize: 1121255424
2018-06-21 08:53:49.853344: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2018-06-21 08:53:49.853378: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[2,50,50,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
return fn(*args)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
status, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 265, in
train(args)
File "train.py", line 213, in train
sess_ret = sess.run(sess2run, feed_dict=feed_dict)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
The text was updated successfully, but these errors were encountered: