assertion failed: [`predictions` out of bound] in Deeplab eval.py with ADE20K #4203

RomRoc · 2018-05-08T11:07:36Z

What is the top-level directory of the model you are using:
/content
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 17.10 in Google Colab (env: Python2 with GPU)
TensorFlow installed from (source or binary):
standard Tensorflow in Google Colab
TensorFlow version (use command below):
('unknown', '1.7.0')
Bazel version (if compiling from source):
N/A
CUDA/cuDNN version:
Cuda 8.0
GPU model and memory:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 29C P8 26W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Exact command to reproduce:
In Google Colab:

Cell1:

%cd
!git clone https://github.com/tensorflow/models.git /content/models

Cell2:

%cd models/research/deeplab/datasets
!sh ./download_and_convert_ade20k.sh

Cell3:

%cd /content/models/research
%env PYTHONPATH=/env/python/:/content/models/research/:/content/models/research/slim
%env WORK_DIR=/content/models/research/deeplab

# Set up the working directories.
%env INIT_FOLDER=/content/models/research/deeplab/datasets/ADE20K/init_models
%env TRAIN_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/train
%env EVAL_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/eval
%env EXPORT_DIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/export
!mkdir -p "${INIT_FOLDER}"
!mkdir -p "${TRAIN_LOGDIR}"
!mkdir -p "${EVAL_LOGDIR}"
!mkdir -p "${EXPORT_DIR}"

# Copy locally the trained checkpoint as the initial checkpoint.
%env TF_INIT_ROOT=http://download.tensorflow.org/models
%env TF_INIT_CKPT=deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
%cd /content/models/research/deeplab/datasets/ADE20K/init_models
!wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
!tar -xf "${TF_INIT_CKPT}"
%cd "/content/models/research/"

%env ADE20K_DATASET=/content/models/research/deeplab/datasets/ADE20K/tfrecord

print('START train.py')
%env NUM_ITERATIONS=1000
!python "${WORK_DIR}"/train.py \
  --logtostderr \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --train_split="train" \
  --model_variant="mobilenet_v2" \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --min_resize_value=350 \
  --max_resize_value=500 \
  --resize_factor=16 \
  --fine_tune_batch_norm=False \
  --dataset="ade20k" \
  --initialize_last_layer=False \
  --last_layers_contain_logits_only=True \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${ADE20K_DATASET}"


print('START eval.py')
!python "${WORK_DIR}"/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="mobilenet_v2" \
    --eval_crop_size=2113 \
    --eval_crop_size=2113 \
    --dataset="ade20k" \
    --checkpoint_dir=${TRAIN_LOGDIR} \
    --eval_logdir=${EVAL_LOGDIR} \
    --dataset_dir=${ADE20K_DATASET}

Describe the problem

I try to train and evaluate deeplab model with ADE20K dataset in Google Colab.
I use as initial checkpoint mobilenetv2_coco_voc_trainaug, but I get the same error if I use xception_coco_voc_trainaug.
I see even others here #3730 has the same problem.
Can you help please?

Source code / logs

I get error in evaluation step:

InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [0 3 3...] [y (mean_iou/ToInt64_2:0) = ] [150]

The text was updated successfully, but these errors were encountered:

tensorflowbutler · 2018-05-10T01:30:08Z

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

RomRoc · 2018-05-10T12:14:42Z

Yes sure, i just did it.
Bye

aquariusjay · 2018-05-10T17:48:35Z

We will update the tutorial and provide a checkpoint for ADE20K soon. Please stay tuned.

trobr · 2018-07-26T02:06:22Z

hi,I have meet the same problem,and I got the solution by here,and I modify the code at line 145 in eval.py from:

metric_map = {}

metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

to

    metric_map = {}

    # insert by trobr
    indices = tf.squeeze(tf.where(tf.less_equal(
        labels, dataset.num_classes - 1)), 1)
    labels = tf.cast(tf.gather(labels, indices), tf.int32)
    predictions = tf.gather(predictions, indices)
    # end of insert

    metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

After that I have got the expected results.

kong869 · 2019-05-14T14:18:56Z

@trobr I got the same problem, and do as you said, new error has arisen:
InvalidArgumentError (see above for traceback): indices[1940480] = 4319229436281876843 is not in [0, 2100225)
[[{{node GatherV2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Select/_4713, Squeeze/_4715, GatherV2/axis)]]
[[{{node GatherV2/_4717}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_780_GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

How can I solve this problem? thanks in advance;

RajatGarg45 · 2019-06-06T08:32:13Z

@trobr
I tried your fix but it gives following error-

AttributeError: 'Dataset' object has no attribute 'num_classes'

wjspoel · 2019-06-14T07:52:01Z

@RajatGarg45, rename 'num_classes' to 'num_of_classes'

tensorflowbutler · 2020-01-29T23:22:31Z

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

ouzzane · 2020-08-24T13:22:04Z

I had almost the same problem ici
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [255 255 255...] [y (mean_iou/Cast_1:0) = ] [6]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

trongan93 · 2020-12-01T06:04:58Z

I also had the same error.

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[mean_iou/confusion_matrix/stack_1/_1731]]
(1) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

cxhttt233 · 2020-12-01T18:39:53Z

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[ConstantFoldingCtrl/mean_iou/confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Switch_0/_4462]]
(1) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

cxhttt233 · 2020-12-01T18:40:08Z

same

tensorflowbutler added the stat:awaiting response Waiting on input from the contributor label May 10, 2018

tensorflowbutler assigned k-w-w May 10, 2018

tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label May 11, 2018

apolo74 mentioned this issue Oct 25, 2018

[deeplab] Training deeplab model with ADE20K dataset #3730

Open

daiyaanarfeen pushed a commit to daiyaanarfeen/models that referenced this issue Jan 24, 2019

tensorflow#4203 solution, not sure if correct

3b7682e

ashnair1 pushed a commit to ashnair1/models that referenced this issue Oct 10, 2019

Fix for eval.py. Refer models issue tensorflow#4203

54a8eeb

tensorflowbutler closed this as completed Feb 7, 2020

AndreiBaraian mentioned this issue Jul 6, 2020

[Deeplab] eval.py/vis.py not working for custom dataset #8792

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assertion failed: [`predictions` out of bound] in Deeplab eval.py with ADE20K #4203

assertion failed: [`predictions` out of bound] in Deeplab eval.py with ADE20K #4203

RomRoc commented May 8, 2018 •

edited

Loading

tensorflowbutler commented May 10, 2018

RomRoc commented May 10, 2018

aquariusjay commented May 10, 2018

trobr commented Jul 26, 2018 •

edited

Loading

kong869 commented May 14, 2019

RajatGarg45 commented Jun 6, 2019

wjspoel commented Jun 14, 2019

tensorflowbutler commented Jan 29, 2020

ouzzane commented Aug 24, 2020

trongan93 commented Dec 1, 2020

cxhttt233 commented Dec 1, 2020

cxhttt233 commented Dec 1, 2020

assertion failed: [predictions out of bound] in Deeplab eval.py with ADE20K #4203

assertion failed: [predictions out of bound] in Deeplab eval.py with ADE20K #4203

Comments

RomRoc commented May 8, 2018 • edited Loading

Describe the problem

Source code / logs

tensorflowbutler commented May 10, 2018

RomRoc commented May 10, 2018

aquariusjay commented May 10, 2018

trobr commented Jul 26, 2018 • edited Loading

kong869 commented May 14, 2019

RajatGarg45 commented Jun 6, 2019

wjspoel commented Jun 14, 2019

tensorflowbutler commented Jan 29, 2020

ouzzane commented Aug 24, 2020

trongan93 commented Dec 1, 2020

cxhttt233 commented Dec 1, 2020

cxhttt233 commented Dec 1, 2020

assertion failed: [`predictions` out of bound] in Deeplab eval.py with ADE20K #4203

assertion failed: [`predictions` out of bound] in Deeplab eval.py with ADE20K #4203

RomRoc commented May 8, 2018 •

edited

Loading

trobr commented Jul 26, 2018 •

edited

Loading