Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

Error on evaluate.py (Tensorflow 1.3) #107

Closed
MyVanitar opened this issue Jul 31, 2017 · 14 comments
Closed

Error on evaluate.py (Tensorflow 1.3) #107

MyVanitar opened this issue Jul 31, 2017 · 14 comments

Comments

@MyVanitar
Copy link

MyVanitar commented Jul 31, 2017

Hi,

I can train the model flawlessly, but I get the following errors when I run evaluate.py . I use Tensorflow 1.3:

Restored model parameters from data/model.ckpt-900
2017-07-31 04:54:58.479363: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]
2017-07-31 04:54:58.479431: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]
Traceback (most recent call last):
  File "evaluate.py", line 127, in <module>
    main()
  File "evaluate.py", line 119, in main
    preds, _ = sess.run([pred, update_op])
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1118, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1315, in _do_run
    options, run_metadata)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]

Caused by op u'mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert', defined at:
  File "evaluate.py", line 127, in <module>
    main()
  File "evaluate.py", line 98, in main
    mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes, weights=weights)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py", line 2245, in streaming_mean_iou
    updates_collections=updates_collections, name=name)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 915, in mean_iou
    num_classes, weights)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 285, in _streaming_confusion_matrix
    labels, predictions, num_classes, weights=weights, dtype=cm_dtype)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/confusion_matrix.py", line 176, in confusion_matrix
    labels, num_classes_int64, message='`labels` out of bound')],
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 401, in assert_less
    return control_flow_ops.Assert(condition, data, summarize=summarize)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 132, in Assert
    condition, no_op, true_assert, name="AssertGuard")
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 296, in new_func
    return func(*args, **kwargs)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1838, in cond
    orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1704, in BuildCondBranch
    original_result = fn()
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 130, in true_assert
    condition, data, summarize, name="Assert")
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 37, in _assert
    summarize=summarize, name=name)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]

@MyVanitar
Copy link
Author

MyVanitar commented Jul 31, 2017

it seems there are also some issues with TensorFlow 1.3 in running Tensorboard.

@MyVanitar MyVanitar changed the title Error on evaluate.py Error on evaluate.py (Tensorflow 1.3) Jul 31, 2017
@xxxzhi
Copy link

xxxzhi commented Aug 30, 2017

I suffer from the same Error. I found this is because of the 255 label.

@MyVanitar
Copy link
Author

@xxxzhi

You mean you solved it?

@xxxzhi
Copy link

xxxzhi commented Aug 30, 2017

@VanitarNordic no, I just downgrade tensorflow to version 1.2.1. It is due to this commit 6ac3efd42902d48d45d59128926110e6d5121a08

It add assert : assert_less(label, num_class)

I think that it will ok after reverting this commit.

@DrSleep
Copy link
Owner

DrSleep commented Sep 16, 2017

tensorflow 1.3 is not yet supported. stay tuned.

@DrSleep DrSleep closed this as completed Sep 16, 2017
@DrSleep
Copy link
Owner

DrSleep commented Oct 2, 2017

Looks like an explicit assertion has been added in TF1.3 when computing mIoU that checks whether ground truth labels are less than the number of classes.
It can be overcome using the same strategy as in train scripts:

pred = tf.reshape(pred, [-1,])
gt = tf.reshape(label_batch, [-1,])
indices = tf.squeeze(tf.where(tf.less_equal(raw_gt, args.num_classes - 1)), 1) ## ignore all labels >= num_classes
gt = tf.cast(tf.gather(gt, indices), tf.int32)
pred = tf.gather(pred, indices)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes)

@amlarraz
Copy link

amlarraz commented Nov 5, 2017

Thanks for the info @DrSleep ! I had the same problem and I've used your tip and everything works fine in evaluate.py. These are the changes I've made in evaluate.py:

I've replaced lines 97 and 98:

weights = tf.cast(tf.less_equal(gt, args.num_classes - 1), tf.int32) # Ignoring all labels greater than or equal to n_classes.
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes, weights=weights)

for:

indices = tf.squeeze(tf.where(tf.less_equal(gt, num_classes - 1)), 1)  # ignore all labels >= num_classes
gt = tf.cast(tf.gather(gt, indices), tf.int32)
pred = tf.gather(pred, indices)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=num_classes)

I hope this comment will be useful!

@kirk86
Copy link

kirk86 commented Mar 23, 2018

The year is 2018 and yet that problem exist even in the latest deeplab version in tensorflow/models. Thanks to @amlarraz I managed to overcome this issue.

@AzizaZhanabatyrova
Copy link

AzizaZhanabatyrova commented Apr 3, 2018

For evaluate_msc.py I replaced

weights = tf.cast(tf.less_equal(gt, args.num_classes - 1), tf.int32) # Ignoring all labels greater than or equal to n_classes.
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes, weights=weights)

with

indices = tf.squeeze(tf.where(tf.less_equal(gt, args.num_classes - 1)), 1)  # ignore all labels >= num_classes
gt = tf.cast(tf.gather(gt, indices), tf.int32)
pred = tf.gather(pred, indices)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes)

@endluo
Copy link

endluo commented Jul 4, 2018

i am us TF1.8 ,thx for the answer ,there has some problem about the code: indices=tf.squeeze..... before this code should be add a code :num_classes = args.num_classes ,and it will work.

@violet17
Copy link

Thanks for the info @DrSleep ! I had the same problem and I've used your tip and everything works fine in evaluate.py. These are the changes I've made in evaluate.py:

I've replaced lines 97 and 98:

weights = tf.cast(tf.less_equal(gt, args.num_classes - 1), tf.int32) # Ignoring all labels greater than or equal to n_classes.
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes, weights=weights)

for:

indices = tf.squeeze(tf.where(tf.less_equal(gt, num_classes - 1)), 1)  # ignore all labels >= num_classes
gt = tf.cast(tf.gather(gt, indices), tf.int32)
pred = tf.gather(pred, indices)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=num_classes)

I hope this comment will be useful!

Failed to test when I use it in https://github.com/warmspringwinds/tf-image-segmentation/blob/master/tf_image_segmentation/recipes/pascal_voc/FCNs/fcn_32s_test_pascal.ipynb.
Could you please help me?
My TF version is 1.12. Thanks a lot!

@violet17
Copy link

violet17 commented Mar 19, 2019

My python is 2.7 with tensorflow 1.12.
I added these code before caculating mIoU:

indices = tf.cast(tf.less_equal(annotation_batch_tensor, number_of_classes - 1),tf.uint8)
annotation_batch_tensor = tf.multiply(annotation_batch_tensor,indices)
Because the later version of TF has added the assert:

labels = control_flow_ops.with_dependencies(
[check_ops.assert_less(
labels, num_classes_int64, message='labels out of bound')],
labels)
predictions = control_flow_ops.with_dependencies(
[check_ops.assert_less(
predictions, num_classes_int64,
message='predictions out of bound')],
predictions)
from the comments of @xxxzhi in tensorflow/models#2239.

And it it so wired in TF 1.12 that I can't use the solution of tensorflow/models#2239. After using tf.gather, the shape of annotation_batch_tensor is wired and is not compatable with the input labels in slim.metrics.streaming_mean_iou .

Then I thought I must ignore all labels greater than or equal to number_of_classes from the comments of @amlarraz #107.

That is my comment in https://github.com/warmspringwinds/tf-image-segmentation/issues/33

@dgarnitz-zz
Copy link

Has anyone figured out how to fix this error on regular Deeplab, inside the eval.py file, not just for tensorflow-deeplab-resnet project? Thanks.

@qhkm
Copy link

qhkm commented Jul 17, 2019

I'm also having the same problem in eval.py when trying to train using own dataset with pretrained pascal VOC

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants