You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am getting Resource exhausted: OOM error while training.
Below is the part of the error.
2021-03-09 22:25:32.317605: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **************************************____________________________________________
2021-03-09 22:25:32.317641: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:500 : Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[model_0/Mean/_117]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 96, in main
input_im_info: data[2]})
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[model_0/Mean/_117]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Original stack trace for 'model_0/vgg_16/conv1/conv1_2/Conv2D':
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 48, in main
bbox_pred, cls_pred, cls_prob = model.model(input_image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/model_train.py", line 68, in model
conv5_3 = vgg.vgg_16(image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/vgg.py", line 18, in vgg_16
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 2619, in repeat
outputs = layer(outputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1159, in convolution2d
conv_dims=2)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply
return self.call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/layers/base.py", line 548, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in call
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
return converted_call(f, options, args, kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
return f(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 197, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1134, in call
return self.conv_op(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 639, in call
return self.call(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 238, in call
name=self.name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d
name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
The text was updated successfully, but these errors were encountered:
GadaaDhaariGeek
changed the title
what is the batch size we are using by default ?
Resource Exhausted: OOM issue
Mar 10, 2021
I am getting Resource exhausted: OOM error while training.
Below is the part of the error.
2021-03-09 22:25:32.317605: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **************************************____________________________________________
2021-03-09 22:25:32.317641: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:500 : Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 96, in main
input_im_info: data[2]})
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
Original stack trace for 'model_0/vgg_16/conv1/conv1_2/Conv2D':
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 48, in main
bbox_pred, cls_pred, cls_prob = model.model(input_image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/model_train.py", line 68, in model
conv5_3 = vgg.vgg_16(image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/vgg.py", line 18, in vgg_16
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 2619, in repeat
outputs = layer(outputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1159, in convolution2d
conv_dims=2)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply
return self.call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/layers/base.py", line 548, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in call
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
return converted_call(f, options, args, kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
return f(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 197, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1134, in call
return self.conv_op(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 639, in call
return self.call(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 238, in call
name=self.name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d
name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
The text was updated successfully, but these errors were encountered: