Resource Exhausted: OOM issue #477

GadaaDhaariGeek · 2021-03-09T13:25:02Z

I am getting Resource exhausted: OOM error while training.
Below is the part of the error.

2021-03-09 22:25:32.317605: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **************************************____________________________________________
2021-03-09 22:25:32.317641: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:500 : Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[model_0/Mean/_117]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 96, in main
input_im_info: data[2]})
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[model_0/Mean/_117]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'model_0/vgg_16/conv1/conv1_2/Conv2D':
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 48, in main
bbox_pred, cls_pred, cls_prob = model.model(input_image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/model_train.py", line 68, in model
conv5_3 = vgg.vgg_16(image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/vgg.py", line 18, in vgg_16
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 2619, in repeat
outputs = layer(outputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1159, in convolution2d
conv_dims=2)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply
return self.call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/layers/base.py", line 548, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in call
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
return converted_call(f, options, args, kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
return f(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 197, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1134, in call
return self.conv_op(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 639, in call
return self.call(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 238, in call
name=self.name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d
name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()

The text was updated successfully, but these errors were encountered:

GadaaDhaariGeek changed the title ~~what is the batch size we are using by default ?~~ Resource Exhausted: OOM issue Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource Exhausted: OOM issue #477

Resource Exhausted: OOM issue #477

GadaaDhaariGeek commented Mar 9, 2021 •

edited

Loading

Resource Exhausted: OOM issue #477

Resource Exhausted: OOM issue #477

Comments

GadaaDhaariGeek commented Mar 9, 2021 • edited Loading

GadaaDhaariGeek commented Mar 9, 2021 •

edited

Loading