[deeplab] Unable to Evaluate on ADE20k #4089

pedropgusmao · 2018-04-25T17:29:14Z

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information

What is the top-level directory of the model you are using: ?
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): The stock example is not explicitly available
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.7.0
Bazel version (if compiling from source): 0.11
CUDA/cuDNN version: 7.1.3
GPU model and memory: NVIDIA Titan V 12GB
Exact command to reproduce:

python deeplab/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --eval_crop_size=513 \
    --eval_crop_size=513 \
    --dataset="ade20k" \
    --checkpoint_dir='deeplab/datasets/ADE20K/exp/train_on_train_set/train' \
    --eval_logdir='deeplab/datasets/ADE20K/exp/train_on_train_set/eval' \
    --dataset_dir='deeplab/datasets/ADE20K/tfrecord'

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

It seems that the script is not correctly cropping the images before execution.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

2018-04-25 17:26:15.635672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:09:00.0 totalMemory: 11.78GiB freeMemory: 11.36GiB 2018-04-25 17:26:15.635730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-25 17:26:16.432618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-25 17:26:16.432667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-25 17:26:16.432678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-25 17:26:16.433049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10989 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:09:00.0, compute capability: 7.0) INFO:tensorflow:Restoring parameters from deeplab/datasets/ADE20K/exp/train_on_train_set/train/model.ckpt-50000 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. 2018-04-25 17:26:19.373963: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at queue_ops.cc:105 : Invalid argument: Shape mismatch in tuple component 1. Expected [513,513,3], got [513,683,3] INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 1. Expected [513,513,3], got [513,683,3] [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT64, DT_FLOAT, DT_STRING, DT_INT32, DT_UINT8, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, ParseSingleExample/ParseSingleExample:3, add_2/_4681, ParseSingleExample/ParseSingleExample:1, add_3/_4683, batch/packed, ParseSingleExample/ParseSingleExample:6)]] INFO:tensorflow:Starting evaluation at 2018-04-25-17:26:20 INFO:tensorflow:Finished evaluation at 2018-04-25-17:26:20 miou_1.0[0] Traceback (most recent call last): File "deeplab/eval.py", line 176, in <module> tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "deeplab/eval.py", line 169, in main eval_interval_secs=FLAGS.eval_interval_secs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 301, in evaluation_loop timeout=timeout) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/training/python/training/evaluation.py", line 455, in evaluate_repeatedly '%Y-%m-%d-%H:%M:%S', time.gmtime())) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 658, in __exit__ self._close_internal(exception_type) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 695, in _close_internal self._sess.close() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 943, in close self._sess.close() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py", line 1087, in close ignore_live_threads=True) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/usr/local/lib/python3.5/dist-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run enqueue_callable() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1249, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 1. Expected [513,513,3], got [513,683,3] [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT64, DT_FLOAT, DT_STRING, DT_INT32, DT_UINT8, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, ParseSingleExample/ParseSingleExample:3, add_2/_4681, ParseSingleExample/ParseSingleExample:1, add_3/_4683, batch/packed, ParseSingleExample/ParseSingleExample:6)]]

The text was updated successfully, but these errors were encountered:

RomRoc · 2018-04-30T08:47:13Z

See here for solution: #3886

In ADE20K validation dataset largest image is 2100 x 2100, so I set eval_crop_size=2113.

Anyway when I run eval.py in google colab I get eror:


tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [150]
	 [[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch/_4751, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1/_4753, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2/_4755)]]
	 [[Node: mean_iou/confusion_matrix/SparseTensorDenseAdd/_4769 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2155_mean_iou/confusion_matrix/SparseTensorDenseAdd", tensor_type=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

pedropgusmao · 2018-04-30T13:09:12Z

@RomRoc , sorry but it is still not clear to me what needs to be done. Should I use different values of --eval_crop_size? I try to increase the value of K and set the same value for both --eval_crop_size.
When I reach --eval_crop_size=961 (k =60), I start to get

InvalidArgumentError (see above for traceback): assertion failed: [predictions out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [150]

RomRoc · 2018-04-30T20:56:16Z

The biggest image in ADE20K validation dataset is 2100 x 2100, so I run with these parameters:

python deeplab/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --eval_crop_size=2113 \
    --eval_crop_size=2113 \
    --dataset="ade20k" \
    --checkpoint_dir=${TRAIN_LOGDIR} \
    --eval_logdir=${EVAL_LOGDIR} \
    --dataset_dir=${ADE20K_DATASET}

But I get the error specified above.

shivpatri · 2018-05-03T10:31:47Z

I too am stuck at the same issue. Could anyone help?

haichaoyu · 2018-05-05T07:09:16Z

Hello, @RomRoc

Where did you download the ADE20k dataset with largest validation image size 2100^2? I also checked the size but got different outputs from yours. Here, both train_largest and valid_largest is 3504 * 3888. Here, train_largest is 2100 * 2100 and valid_largest is 1600 * 1600.

Could you please provide some details? Thanks.

Haichao

RomRoc · 2018-05-06T09:51:38Z

Hello @haichaoyu
I use the script provided here that downloads from http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
The image 2100x2100 is ADE_train_00006921.jpg in training dataset.

kkahatapitiya · 2018-05-08T17:24:15Z

@ShivakshiT try DrSleep's solution here Worked for me for training on MScocostuff

wt-huang · 2018-11-03T00:52:52Z

Closing as this is resolved

tensorflowbutler assigned k-w-w Apr 26, 2018

k-w-w assigned YknZhu and unassigned k-w-w Apr 26, 2018

k-w-w added the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Apr 26, 2018

wt-huang closed this as completed Nov 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deeplab] Unable to Evaluate on ADE20k #4089

[deeplab] Unable to Evaluate on ADE20k #4089

pedropgusmao commented Apr 25, 2018 •

edited

Loading

RomRoc commented Apr 30, 2018 •

edited

Loading

pedropgusmao commented Apr 30, 2018

RomRoc commented Apr 30, 2018

shivpatri commented May 3, 2018

haichaoyu commented May 5, 2018

RomRoc commented May 6, 2018

kkahatapitiya commented May 8, 2018 •

edited

Loading

wt-huang commented Nov 3, 2018

[deeplab] Unable to Evaluate on ADE20k #4089

[deeplab] Unable to Evaluate on ADE20k #4089

Comments

pedropgusmao commented Apr 25, 2018 • edited Loading

System information

Describe the problem

Source code / logs

RomRoc commented Apr 30, 2018 • edited Loading

pedropgusmao commented Apr 30, 2018

RomRoc commented Apr 30, 2018

shivpatri commented May 3, 2018

haichaoyu commented May 5, 2018

RomRoc commented May 6, 2018

kkahatapitiya commented May 8, 2018 • edited Loading

wt-huang commented Nov 3, 2018

pedropgusmao commented Apr 25, 2018 •

edited

Loading

RomRoc commented Apr 30, 2018 •

edited

Loading

kkahatapitiya commented May 8, 2018 •

edited

Loading