Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Exhausted: OOM issue #477

Open
GadaaDhaariGeek opened this issue Mar 9, 2021 · 0 comments
Open

Resource Exhausted: OOM issue #477

GadaaDhaariGeek opened this issue Mar 9, 2021 · 0 comments

Comments

@GadaaDhaariGeek
Copy link

GadaaDhaariGeek commented Mar 9, 2021

I am getting Resource exhausted: OOM error while training.
Below is the part of the error.

2021-03-09 22:25:32.317605: W tensorflow/core/common_runtime/bfc_allocator.cc:424] **************************************____________________________________________
2021-03-09 22:25:32.317641: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:500 : Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[model_0/Mean/_117]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_0/vgg_16/conv1/conv1_2/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 96, in main
input_im_info: data[2]})
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[model_0/Mean/_117]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[1,64,5333,4000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_0/vgg_16/conv1/conv1_2/Conv2D (defined at /home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'model_0/vgg_16/conv1/conv1_2/Conv2D':
File "main/train.py", line 117, in
tf.app.run()
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "main/train.py", line 48, in main
bbox_pred, cls_pred, cls_prob = model.model(input_image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/model_train.py", line 68, in model
conv5_3 = vgg.vgg_16(image)
File "/apps/holmes-share/Ftpdata/ctpn/ctpn/text-detection-ctpn-banjin-dev/nets/vgg.py", line 18, in vgg_16
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 2619, in repeat
outputs = layer(outputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1159, in convolution2d
conv_dims=2)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1700, in apply
return self.call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/layers/base.py", line 548, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in call
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
return converted_call(f, options, args, kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
return f(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 197, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1134, in call
return self.conv_op(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 639, in call
return self.call(inp, filter)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 238, in call
name=self.name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2010, in conv2d
name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/a1034615/venvs/py35/lib/python3.5/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()

@GadaaDhaariGeek GadaaDhaariGeek changed the title what is the batch size we are using by default ? Resource Exhausted: OOM issue Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant