Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perspective transformer 3D evaluation script broken #2239

Closed
mees opened this issue Aug 17, 2017 · 10 comments
Closed

Perspective transformer 3D evaluation script broken #2239

mees opened this issue Aug 17, 2017 · 10 comments
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener

Comments

@mees
Copy link
Contributor

mees commented Aug 17, 2017

System information

  • What is the top-level directory of the model you are using: Ptn directory

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, I have fixed the first flag issue mees@48ac9b0

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 14.04

  • TensorFlow installed from (source or binary): Source

  • TensorFlow version (use command below): ('v1.3.0-rc1-27-g2784b1c', '1.3.0-rc2')

  • Bazel version (if compiling from source): 0.5.2

  • CUDA/cuDNN version: 8.0/5.1

  • GPU model and memory: Titan X

  • Exact command to reproduce: CUDA_VISIBLE_DEVICES=0 bazel-bin/eval_ptn '--checkpoint_dir=/home/meeso/models/ptn/my_models/no_finetune' '--inp_dir=/tmp/shapenet_tf/' '--model_name=ptn_finetune'

Describe the problem

Once I train a ptn network, I run the evaluation script (once fixed the first problem with the wrong flags #2236) to compute the IOU. I get an error that the predictions are out of bound. The offending line seems to be here https://github.com/tensorflow/models/blob/master/ptn/metrics.py#L103 but I am not sure where this mapping comes from (3*times predictions -2) and if its correct.

Source code / logs

2017-08-17 18:11:38.347664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:02:00.0)
Traceback (most recent call last):
  File "/home/meeso/models/ptn/bazel-bin/eval_ptn.runfiles/__main__/eval_ptn.py", line 134, in <module>
    app.run()
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/meeso/models/ptn/bazel-bin/eval_ptn.runfiles/__main__/eval_ptn.py", line 130, in main
    eval_interval_secs=FLAGS.eval_interval_secs)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 296, in evaluation_loop
    timeout=timeout)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/evaluation.py", line 452, in evaluate_repeatedly
    session.run(eval_ops, feed_dict)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 518, in run
    run_metadata=run_metadata)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 862, in run
    run_metadata=run_metadata)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run
    return self._sess.run(*args, **kwargs)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 972, in run
    run_metadata=run_metadata)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 818, in run
    return self._sess.run(*args, **kwargs)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [3 3 3...] [y (mean_iou/ToInt64_2:0) = ] [3]
	 [[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch/_259, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1/_261, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2/_263)]]
	 [[Node: mean_iou/confusion_matrix/SparseTensorDenseAdd/_281 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_497_mean_iou/confusion_matrix/SparseTensorDenseAdd", tensor_type=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op u'mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert', defined at:
  File "/home/meeso/models/ptn/bazel-bin/eval_ptn.runfiles/__main__/eval_ptn.py", line 134, in <module>
    app.run()
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/meeso/models/ptn/bazel-bin/eval_ptn.runfiles/__main__/eval_ptn.py", line 118, in main
    names_to_values, names_to_updates = model.get_metrics(inputs, outputs)
  File "/home/meeso/models/ptn/model_ptn.py", line 104, in get_metrics
    tmp_values, tmp_updates = metrics.add_volume_iou_metrics(inputs, outputs)
  File "/home/meeso/models/ptn/metrics.py", line 107, in add_volume_iou_metrics
    num_classes=3)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 915, in mean_iou
    num_classes, weights)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 285, in _streaming_confusion_matrix
    labels, predictions, num_classes, weights=weights, dtype=cm_dtype)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/confusion_matrix.py", line 181, in confusion_matrix
    message='`predictions` out of bound')],
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 401, in assert_less
    return control_flow_ops.Assert(condition, data, summarize=summarize)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 131, in Assert
    condition, no_op, true_assert, name="AssertGuard")
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 296, in new_func
    return func(*args, **kwargs)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1828, in cond
    orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1694, in BuildCondBranch
    original_result = fn()
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 129, in true_assert
    condition, data, summarize, name="Assert")
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 35, in _assert
    summarize=summarize, name=name)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/meeso/.virtualenvs/2d_to_3d/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [3 3 3...] [y (mean_iou/ToInt64_2:0) = ] [3]
	 [[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch/_259, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1/_261, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2/_263)]]
	 [[Node: mean_iou/confusion_matrix/SparseTensorDenseAdd/_281 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_497_mean_iou/confusion_matrix/SparseTensorDenseAdd", tensor_type=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
@mees
Copy link
Contributor Author

mees commented Aug 17, 2017

@xcyan, @arkanath any suggestions?

@xcyan
Copy link
Contributor

xcyan commented Aug 18, 2017

Looks like the API of tf.metrics changes in v1.3.

Related issue: DrSleep/tensorflow-deeplab-resnet#107

@mees
Copy link
Contributor Author

mees commented Aug 18, 2017

This is weird, because you guys said that ptn requieres tf 1.3, right? Anyway, how can we fix the bug?

@mees
Copy link
Contributor Author

mees commented Aug 24, 2017

@xcyan @arkanath can you reproduce this bug? I was checking but can't find no API changes for tf.metrics.mean_iou
tf 1.3: https://www.tensorflow.org/api_docs/python/tf/metrics/mean_iou
tf 1.0 https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/metrics/mean_iou
and in the code there is no change too:
https://github.com/tensorflow/tensorflow/blame/master/tensorflow/python/ops/metrics_impl.py#L859
Any suggestions?

@mees
Copy link
Contributor Author

mees commented Aug 24, 2017

I found the bug, created a PR to fix it

@xxxzhi
Copy link

xxxzhi commented Aug 30, 2017

@xcyan
Copy link
Contributor

xcyan commented Aug 30, 2017

I didn't see any related changes in line 172. What do you mean?

@xcyan
Copy link
Contributor

xcyan commented Sep 19, 2017

@mees @hellojas Ready to close this issue?

@mees
Copy link
Contributor Author

mees commented Sep 19, 2017

Sure, thanks @xcyan!

@mees mees closed this as completed Sep 19, 2017
@xxxzhi
Copy link

xxxzhi commented Sep 21, 2017

sorry, here: https://raw.githubusercontent.com/tensorflow/tensorflow/6ac3efd42902d48d45d59128926110e6d5121a08/tensorflow/python/ops/confusion_matrix.py

This commit add the assert:

   labels = control_flow_ops.with_dependencies(
      [check_ops.assert_less(
          labels, num_classes_int64, message='`labels` out of bound')],
      labels)
  predictions = control_flow_ops.with_dependencies(
      [check_ops.assert_less(
          predictions, num_classes_int64,
          message='`predictions` out of bound')],
      predictions)

I don't think the assert is nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener
Projects
None yet
Development

No branches or pull requests

4 participants