CUDA_ERRIR_OUT_OF_MEMORY #8

ahmedshingaly · 2020-05-22T01:56:45Z

thank you alot for this repository and tutorial

I am facing CUDA_OUT_OF_MEMORY

my log is
`dnnlib: Running training.training_loop.training_loop() on localhost...
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Streaming data using training.dataset.TFRecordDataset...
Dataset shape = [3, 1024, 1024]
Dynamic range = [0, 255]
Label size = 0
Loading networks from "results\00001-pretrained\network-snapshot-10000.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.

G Params OutputShape WeightShape

latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 18, 512) -
G_mapping/dlatents_out - (?, 18, 512) -
Truncation/Lerp - (?, 18, 512) -
G_synthesis/dlatents_in - (?, 18, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/128x128/Conv0_up 1442561 (?, 256, 128, 128) (3, 3, 512, 256)
G_synthesis/128x128/Conv1 721409 (?, 256, 128, 128) (3, 3, 256, 256)
G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 132099 (?, 3, 128, 128) (1, 1, 256, 3)
G_synthesis/256x256/Conv0_up 426369 (?, 128, 256, 256) (3, 3, 256, 128)
G_synthesis/256x256/Conv1 213249 (?, 128, 256, 256) (3, 3, 128, 128)
G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 66051 (?, 3, 256, 256) (1, 1, 128, 3)
G_synthesis/512x512/Conv0_up 139457 (?, 64, 512, 512) (3, 3, 128, 64)
G_synthesis/512x512/Conv1 69761 (?, 64, 512, 512) (3, 3, 64, 64)
G_synthesis/512x512/Upsample - (?, 3, 512, 512) -
G_synthesis/512x512/ToRGB 33027 (?, 3, 512, 512) (1, 1, 64, 3)
G_synthesis/1024x1024/Conv0_up 51297 (?, 32, 1024, 1024) (3, 3, 64, 32)
G_synthesis/1024x1024/Conv1 25665 (?, 32, 1024, 1024) (3, 3, 32, 32)
G_synthesis/1024x1024/Upsample - (?, 3, 1024, 1024) -
G_synthesis/1024x1024/ToRGB 16515 (?, 3, 1024, 1024) (1, 1, 32, 3)
G_synthesis/images_out - (?, 3, 1024, 1024) -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 8, 8) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 16, 16) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 32, 32) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 64, 64) -
G_synthesis/noise8 - (1, 1, 64, 64) -
G_synthesis/noise9 - (1, 1, 128, 128) -
G_synthesis/noise10 - (1, 1, 128, 128) -
G_synthesis/noise11 - (1, 1, 256, 256) -
G_synthesis/noise12 - (1, 1, 256, 256) -
G_synthesis/noise13 - (1, 1, 512, 512) -
G_synthesis/noise14 - (1, 1, 512, 512) -
G_synthesis/noise15 - (1, 1, 1024, 1024) -
G_synthesis/noise16 - (1, 1, 1024, 1024) -
images_out - (?, 3, 1024, 1024) -

Total 30370060

D Params OutputShape WeightShape

images_in - (?, 3, 1024, 1024) -
labels_in - (?, 0) -
1024x1024/FromRGB 128 (?, 32, 1024, 1024) (1, 1, 3, 32)
1024x1024/Conv0 9248 (?, 32, 1024, 1024) (3, 3, 32, 32)
1024x1024/Conv1_down 18496 (?, 64, 512, 512) (3, 3, 32, 64)
1024x1024/Skip 2048 (?, 64, 512, 512) (1, 1, 32, 64)
512x512/Conv0 36928 (?, 64, 512, 512) (3, 3, 64, 64)
512x512/Conv1_down 73856 (?, 128, 256, 256) (3, 3, 64, 128)
512x512/Skip 8192 (?, 128, 256, 256) (1, 1, 64, 128)
256x256/Conv0 147584 (?, 128, 256, 256) (3, 3, 128, 128)
256x256/Conv1_down 295168 (?, 256, 128, 128) (3, 3, 128, 256)
256x256/Skip 32768 (?, 256, 128, 128) (1, 1, 128, 256)
128x128/Conv0 590080 (?, 256, 128, 128) (3, 3, 256, 256)
128x128/Conv1_down 1180160 (?, 512, 64, 64) (3, 3, 256, 512)
128x128/Skip 131072 (?, 512, 64, 64) (1, 1, 256, 512)
64x64/Conv0 2359808 (?, 512, 64, 64) (3, 3, 512, 512)
64x64/Conv1_down 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
64x64/Skip 262144 (?, 512, 32, 32) (1, 1, 512, 512)
32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512)
16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512)
8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512)
8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512)
4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512)
4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)
scores_out - (?, 1) -

Total 29012513

Building TensorFlow graph...
Initializing logs...
Training for 25000 kimg...

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run_training.py", line 201, in
main()
File "run_training.py", line 196, in main
run(**vars(args))
File "run_training.py", line 127, in run
dnnlib.submit_run(**kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 302, in training_loop
tflib.run(G_train_op, feed_dict)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\tfutil.py", line 31, in run
return tf.get_default_session().run(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square (defined at :104) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation.
Input Source operations connected to node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square:
GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/mul_3 (defined at :100)

Original stack trace for 'GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square':
File "run_training.py", line 201, in
main()
File "run_training.py", line 196, in main
run(**vars(args))
File "run_training.py", line 127, in run
dnnlib.submit_run(**kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 223, in training_loop
G_loss, G_reg = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_gpu_in, **G_loss_args)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\util.py", line 256, in call_func_by_name
return func_obj(*args, **kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\loss.py", line 152, in G_logistic_ns_pathreg
fake_images_out, fake_dlatents_out = G.get_output_for(latents, labels, is_training=True, return_dlatents=True)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for
out_expr = self._build_func(*final_inputs, **build_kwargs)
File "", line 238, in G_main
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for
out_expr = self._build_func(*final_inputs, **build_kwargs)
File "", line 498, in G_synthesis_stylegan2
File "", line 468, in block
File "", line 455, in layer
File "", line 104, in modulated_conv2d_layer
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 10698, in square
"Square", x=x, name=name)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

`
my tfrecordsize is

any idea how to solve this problem
thank you in advance

ahmedshingaly · 2020-05-22T02:02:16Z

here is GPU information of my second computer, I had no luck running your repository on both of them, you replied to my youtube comment by stating that I should have GPU=16GM!, is the below specifications not enough

dvschultz · 2020-05-27T19:59:26Z

11GB is probably too small depending on what you’re training, especially is you’re running additional processes on it.

ahmedshingaly · 2020-05-29T07:29:13Z

I see, thank you very much @dvschultz I have another question

how can I create stylegan model with (1, 18, 512)

my stylegan model is creating shape (1, 12, 512) and I cannot find the latent space developed by Puzer because of shape difference

in more details:
my model produce shape with (1, 12, 512) using (https://github.com/NVlabs/stylegan)
but when I use stylegan encoder (https://github.com/Puzer/stylegan-encoder) to find latent space it requires (1, 18, 512), do you have any idea how can I produce (1, 18, 512) model shapes instead of (1, 12, 512)?

dvschultz · 2020-06-09T20:40:56Z

what size output is your model? As I recall only 1024 does (1,18,512). Smaller resolutions will generate smaller shapes. Many of the encoders are only set up to with with FFHQ and its 1024^2 resolutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA_ERRIR_OUT_OF_MEMORY #8

CUDA_ERRIR_OUT_OF_MEMORY #8

ahmedshingaly commented May 22, 2020

ahmedshingaly commented May 22, 2020 •

edited

Loading

dvschultz commented May 27, 2020

ahmedshingaly commented May 29, 2020

dvschultz commented Jun 9, 2020

CUDA_ERRIR_OUT_OF_MEMORY #8

CUDA_ERRIR_OUT_OF_MEMORY #8

Comments

ahmedshingaly commented May 22, 2020

ahmedshingaly commented May 22, 2020 • edited Loading

dvschultz commented May 27, 2020

ahmedshingaly commented May 29, 2020

dvschultz commented Jun 9, 2020

ahmedshingaly commented May 22, 2020 •

edited

Loading