Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failed: [predictions out of bound] in Deeplab eval.py with ADE20K #4203

Closed
RomRoc opened this issue May 8, 2018 · 12 comments
Closed
Assignees

Comments

@RomRoc
Copy link

RomRoc commented May 8, 2018

  • What is the top-level directory of the model you are using:
    /content

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    No

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Ubuntu 17.10 in Google Colab (env: Python2 with GPU)

  • TensorFlow installed from (source or binary):
    standard Tensorflow in Google Colab

  • TensorFlow version (use command below):
    ('unknown', '1.7.0')

  • Bazel version (if compiling from source):
    N/A

  • CUDA/cuDNN version:
    Cuda 8.0

  • GPU model and memory:
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.111 Driver Version: 384.111 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
    | N/A 29C P8 26W / 149W | 0MiB / 11439MiB | 0% Default |
    +-------------------------------+----------------------+----------------------+

  • Exact command to reproduce:
    In Google Colab:

Cell1:

%cd
!git clone https://github.com/tensorflow/models.git /content/models

Cell2:

%cd models/research/deeplab/datasets
!sh ./download_and_convert_ade20k.sh

Cell3:

%cd /content/models/research
%env PYTHONPATH=/env/python/:/content/models/research/:/content/models/research/slim
%env WORK_DIR=/content/models/research/deeplab

# Set up the working directories.
%env INIT_FOLDER=/content/models/research/deeplab/datasets/ADE20K/init_models
%env TRAIN_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/train
%env EVAL_LOGDIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/eval
%env EXPORT_DIR=/content/models/research/deeplab/datasets/ADE20K/exp/train_on_trainval_set/export
!mkdir -p "${INIT_FOLDER}"
!mkdir -p "${TRAIN_LOGDIR}"
!mkdir -p "${EVAL_LOGDIR}"
!mkdir -p "${EXPORT_DIR}"

# Copy locally the trained checkpoint as the initial checkpoint.
%env TF_INIT_ROOT=http://download.tensorflow.org/models
%env TF_INIT_CKPT=deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz
%cd /content/models/research/deeplab/datasets/ADE20K/init_models
!wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
!tar -xf "${TF_INIT_CKPT}"
%cd "/content/models/research/"

%env ADE20K_DATASET=/content/models/research/deeplab/datasets/ADE20K/tfrecord

print('START train.py')
%env NUM_ITERATIONS=1000
!python "${WORK_DIR}"/train.py \
  --logtostderr \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --train_split="train" \
  --model_variant="mobilenet_v2" \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --min_resize_value=350 \
  --max_resize_value=500 \
  --resize_factor=16 \
  --fine_tune_batch_norm=False \
  --dataset="ade20k" \
  --initialize_last_layer=False \
  --last_layers_contain_logits_only=True \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${ADE20K_DATASET}"


print('START eval.py')
!python "${WORK_DIR}"/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="mobilenet_v2" \
    --eval_crop_size=2113 \
    --eval_crop_size=2113 \
    --dataset="ade20k" \
    --checkpoint_dir=${TRAIN_LOGDIR} \
    --eval_logdir=${EVAL_LOGDIR} \
    --dataset_dir=${ADE20K_DATASET}

Describe the problem

I try to train and evaluate deeplab model with ADE20K dataset in Google Colab.
I use as initial checkpoint mobilenetv2_coco_voc_trainaug, but I get the same error if I use xception_coco_voc_trainaug.
I see even others here #3730 has the same problem.
Can you help please?

Source code / logs

I get error in evaluation step:

InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [0 3 3...] [y (mean_iou/ToInt64_2:0) = ] [150]
@tensorflowbutler tensorflowbutler added the stat:awaiting response Waiting on input from the contributor label May 10, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

@RomRoc
Copy link
Author

RomRoc commented May 10, 2018

Yes sure, i just did it.
Bye

@aquariusjay
Copy link
Contributor

We will update the tutorial and provide a checkpoint for ADE20K soon. Please stay tuned.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label May 11, 2018
@trobr
Copy link

trobr commented Jul 26, 2018

hi,I have meet the same problem,and I got the solution by here,and I modify the code at line 145 in eval.py from:

metric_map = {}

metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

to

    metric_map = {}

    # insert by trobr
    indices = tf.squeeze(tf.where(tf.less_equal(
        labels, dataset.num_classes - 1)), 1)
    labels = tf.cast(tf.gather(labels, indices), tf.int32)
    predictions = tf.gather(predictions, indices)
    # end of insert

    metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

After that I have got the expected results.

@DawnWalker
Copy link

@trobr I got the same problem, and do as you said, new error has arisen:
InvalidArgumentError (see above for traceback): indices[1940480] = 4319229436281876843 is not in [0, 2100225)
[[{{node GatherV2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT64, Tparams=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Select/_4713, Squeeze/_4715, GatherV2/axis)]]
[[{{node GatherV2/_4717}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_780_GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

How can I solve this problem? thanks in advance;

@RajatGarg45
Copy link

@trobr
I tried your fix but it gives following error-

AttributeError: 'Dataset' object has no attribute 'num_classes'

@wjspoel
Copy link

wjspoel commented Jun 14, 2019

@RajatGarg45, rename 'num_classes' to 'num_of_classes'

ashnair1 pushed a commit to ashnair1/models that referenced this issue Oct 10, 2019
@tensorflowbutler
Copy link
Member

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

@ouzzane
Copy link

ouzzane commented Aug 24, 2020

I had almost the same problem ici
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [255 255 255...] [y (mean_iou/Cast_1:0) = ] [6]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

@trongan93
Copy link

I also had the same error.

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[mean_iou/confusion_matrix/stack_1/_1731]]
(1) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

@cxhttt233
Copy link

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]
[[ConstantFoldingCtrl/mean_iou/confusion_matrix/assert_non_negative_1/assert_less_equal/Assert/AssertGuard/Switch_0/_4462]]
(1) Invalid argument: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/Cast_1:0) = ] [2]
[[{{node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert}}]]

@cxhttt233
Copy link

same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests