Problems with muligpu #118

nick-torenvliet · 2020-10-08T15:57:25Z

I'm running the code just fine on the cpu. So my docker container is working.
When running multi gpu, I needed write in a pass of batch_size at line 120 of capsulenet-multi-gpu.py - because there was an error due to lack of passing there.

Now when I run the code...

python capsulenet-multi-gpu.py --gpus 4 --batch_size 300

I get warnings such as:
200/200 [==============================] - ETA: 0s - loss: 0.8408 - capsnet_loss: 0.8094 - decoder_loss: 0.0801WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).
WARNING:tensorflow:Model was constructed with shape (300, 28, 28, 1) for input Tensor("input_1:0", shape=(300, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (75, 28, 28, 1).

And a final error:
Traceback (most recent call last):
File "capsulenet-multi-gpu.py", line 131, in
train(model=multi_model, data=((x_train, y_train), (x_test, y_test)), args=args)
File "capsulenet-multi-gpu.py", line 67, in train
callbacks=[log, tb, lr_decay])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1479, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
return method(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 872, in fit
return_dict=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
return method(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1081, in evaluate
tmp_logs = test_function(iterator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 650, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
self.captured_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Expected size[0] in [0, 32], but got 75
[[node model_3/lambda/Slice (defined at capsulenet-multi-gpu.py:67) ]]
[[model_3/model_2/digitcaps/map/while/LoopCond/_75/_132]]
(1) Invalid argument: Expected size[0] in [0, 32], but got 75
[[node model_3/lambda/Slice (defined at capsulenet-multi-gpu.py:67) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_12079]

Function call stack:
test_function -> test_function

2020-10-08 15:37:53.123350: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

Is there a quick fix for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with muligpu #118

Problems with muligpu #118

nick-torenvliet commented Oct 8, 2020

Problems with muligpu #118

Problems with muligpu #118

Comments

nick-torenvliet commented Oct 8, 2020