-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA_ERRIR_OUT_OF_MEMORY #8
Comments
11GB is probably too small depending on what you’re training, especially is you’re running additional processes on it. |
I see, thank you very much @dvschultz I have another question how can I create stylegan model with (1, 18, 512) my stylegan model is creating shape (1, 12, 512) and I cannot find the latent space developed by Puzer because of shape difference in more details: |
what size output is your model? As I recall only 1024 does (1,18,512). Smaller resolutions will generate smaller shapes. Many of the encoders are only set up to with with FFHQ and its 1024^2 resolutions. |
thank you alot for this repository and tutorial
I am facing CUDA_OUT_OF_MEMORY
my log is
`dnnlib: Running training.training_loop.training_loop() on localhost...
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Streaming data using training.dataset.TFRecordDataset...
Dataset shape = [3, 1024, 1024]
Dynamic range = [0, 255]
Label size = 0
Loading networks from "results\00001-pretrained\network-snapshot-10000.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
G Params OutputShape WeightShape
latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 18, 512) -
G_mapping/dlatents_out - (?, 18, 512) -
Truncation/Lerp - (?, 18, 512) -
G_synthesis/dlatents_in - (?, 18, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/128x128/Conv0_up 1442561 (?, 256, 128, 128) (3, 3, 512, 256)
G_synthesis/128x128/Conv1 721409 (?, 256, 128, 128) (3, 3, 256, 256)
G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 132099 (?, 3, 128, 128) (1, 1, 256, 3)
G_synthesis/256x256/Conv0_up 426369 (?, 128, 256, 256) (3, 3, 256, 128)
G_synthesis/256x256/Conv1 213249 (?, 128, 256, 256) (3, 3, 128, 128)
G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 66051 (?, 3, 256, 256) (1, 1, 128, 3)
G_synthesis/512x512/Conv0_up 139457 (?, 64, 512, 512) (3, 3, 128, 64)
G_synthesis/512x512/Conv1 69761 (?, 64, 512, 512) (3, 3, 64, 64)
G_synthesis/512x512/Upsample - (?, 3, 512, 512) -
G_synthesis/512x512/ToRGB 33027 (?, 3, 512, 512) (1, 1, 64, 3)
G_synthesis/1024x1024/Conv0_up 51297 (?, 32, 1024, 1024) (3, 3, 64, 32)
G_synthesis/1024x1024/Conv1 25665 (?, 32, 1024, 1024) (3, 3, 32, 32)
G_synthesis/1024x1024/Upsample - (?, 3, 1024, 1024) -
G_synthesis/1024x1024/ToRGB 16515 (?, 3, 1024, 1024) (1, 1, 32, 3)
G_synthesis/images_out - (?, 3, 1024, 1024) -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 8, 8) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 16, 16) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 32, 32) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 64, 64) -
G_synthesis/noise8 - (1, 1, 64, 64) -
G_synthesis/noise9 - (1, 1, 128, 128) -
G_synthesis/noise10 - (1, 1, 128, 128) -
G_synthesis/noise11 - (1, 1, 256, 256) -
G_synthesis/noise12 - (1, 1, 256, 256) -
G_synthesis/noise13 - (1, 1, 512, 512) -
G_synthesis/noise14 - (1, 1, 512, 512) -
G_synthesis/noise15 - (1, 1, 1024, 1024) -
G_synthesis/noise16 - (1, 1, 1024, 1024) -
images_out - (?, 3, 1024, 1024) -
Total 30370060
D Params OutputShape WeightShape
images_in - (?, 3, 1024, 1024) -
labels_in - (?, 0) -
1024x1024/FromRGB 128 (?, 32, 1024, 1024) (1, 1, 3, 32)
1024x1024/Conv0 9248 (?, 32, 1024, 1024) (3, 3, 32, 32)
1024x1024/Conv1_down 18496 (?, 64, 512, 512) (3, 3, 32, 64)
1024x1024/Skip 2048 (?, 64, 512, 512) (1, 1, 32, 64)
512x512/Conv0 36928 (?, 64, 512, 512) (3, 3, 64, 64)
512x512/Conv1_down 73856 (?, 128, 256, 256) (3, 3, 64, 128)
512x512/Skip 8192 (?, 128, 256, 256) (1, 1, 64, 128)
256x256/Conv0 147584 (?, 128, 256, 256) (3, 3, 128, 128)
256x256/Conv1_down 295168 (?, 256, 128, 128) (3, 3, 128, 256)
256x256/Skip 32768 (?, 256, 128, 128) (1, 1, 128, 256)
128x128/Conv0 590080 (?, 256, 128, 128) (3, 3, 256, 256)
128x128/Conv1_down 1180160 (?, 512, 64, 64) (3, 3, 256, 512)
128x128/Skip 131072 (?, 512, 64, 64) (1, 1, 256, 512)
64x64/Conv0 2359808 (?, 512, 64, 64) (3, 3, 512, 512)
64x64/Conv1_down 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
64x64/Skip 262144 (?, 512, 32, 32) (1, 1, 512, 512)
32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512)
16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512)
8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512)
8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512)
4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512)
4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)
scores_out - (?, 1) -
Total 29012513
Building TensorFlow graph...
Initializing logs...
Training for 25000 kimg...
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run_training.py", line 201, in
main()
File "run_training.py", line 196, in main
run(**vars(args))
File "run_training.py", line 127, in run
dnnlib.submit_run(**kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 302, in training_loop
tflib.run(G_train_op, feed_dict)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\tfutil.py", line 31, in run
return tf.get_default_session().run(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square (defined at :104) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Errors may have originated from an input operation.
Input Source operations connected to node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square:
GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/mul_3 (defined at :100)
Original stack trace for 'GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square':
File "run_training.py", line 201, in
main()
File "run_training.py", line 196, in main
run(**vars(args))
File "run_training.py", line 127, in run
dnnlib.submit_run(**kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 223, in training_loop
G_loss, G_reg = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_gpu_in, **G_loss_args)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\util.py", line 256, in call_func_by_name
return func_obj(*args, **kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\loss.py", line 152, in G_logistic_ns_pathreg
fake_images_out, fake_dlatents_out = G.get_output_for(latents, labels, is_training=True, return_dlatents=True)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for
out_expr = self._build_func(*final_inputs, **build_kwargs)
File "", line 238, in G_main
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for
out_expr = self._build_func(*final_inputs, **build_kwargs)
File "", line 498, in G_synthesis_stylegan2
File "", line 468, in block
File "", line 455, in layer
File "", line 104, in modulated_conv2d_layer
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 10698, in square
"Square", x=x, name=name)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()
`
my tfrecordsize is
any idea how to solve this problem
thank you in advance
The text was updated successfully, but these errors were encountered: