Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to train Mask R-CNN - checkpoint version conflict #28

Open
JonFillip opened this issue Jun 10, 2023 · 0 comments
Open

Unable to train Mask R-CNN - checkpoint version conflict #28

JonFillip opened this issue Jun 10, 2023 · 0 comments

Comments

@JonFillip
Copy link

Describe the bug
While using TensorFlow Object Detection API, I'm experiencing an issue with a pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model. When attempting to fine-tune this model for my custom task, I receive an error regarding missing variables even though the specified checkpoint seems to contain the appropriate parameters for this model.

To Reproduce
Steps to reproduce the behavior:

  1. Download the pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model from the TensorFlow Model Zoo.
  2. Set up a custom training pipeline configuration, specifying the path to the downloaded checkpoint in the fine_tune_checkpoint field.
  3. Run the model training script (model_main_tf2.py).
  4. The error appears indicating some variables from the checkpoint are not found in the model.

Traceback (most recent call last): File "/content/models/research/object_detection/model_main_tf2.py", line 114, in <module> tf.compat.v1.app.run() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/content/models/research/object_detection/model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 605, in train_loop load_fine_tune_checkpoint( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 398, in load_fine_tune_checkpoint raise ValueError('Checkpoint version should be V2') ValueError: Checkpoint version should be V2

Expected behavior
I expect the model training to begin by loading weights from the specified pre-trained model. The error seems to suggest a mismatch between the model architecture defined in my pipeline and the architecture of the pre-trained model. Still, my pipeline configuration appears to be correctly set up for the Mask R-CNN Inception ResNet V2 1024x1024 model.

Desktop (please complete the following information):

  • OS: MacOS 13.4 (22F66)
  • Browser Safari
  • Version 16.5 (18615.2.9.11.4)

N.B: I am using Google Colab Pro
Tensorflow version: 2.12.0

pipeline.txt

Additional context

Upon inspecting the checkpoint file with inspect_checkpoint.py, it does appear to contain all the expected variables for a Mask R-CNN Inception ResNet V2 1024x1024 model. I also confirmed that the downloaded files include ckpt-0.index, ckpt-0.data-00000-of-00001, and checkpoint. Yet, the issue persists. Any guidance or solutions to this problem would be greatly appreciated.

I have attached my pipeline.config file below:

pipeline.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant