Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing at loading checkpoints #12

Open
AliBuildsAI opened this issue Mar 22, 2023 · 5 comments
Open

Failing at loading checkpoints #12

AliBuildsAI opened this issue Mar 22, 2023 · 5 comments

Comments

@AliBuildsAI
Copy link

Hi,

I am trying to load the checkpoints. I have followed #11 and ran this code:

saved_path = './trained_checkpoints/rt1main'
from tf_agents.policies import py_tf_eager_policy

py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    model_path=saved_path,
    load_specs_from_pbtxt=True,
    use_tf_function=True,
)

But I am getting this error:

Traceback (most recent call last):
  File "/home/ali/workspace/repos/google-research/robotics_transformer/load_checkpoints.py", line 7, in <module>
    py_tf_eager_policy.SavedModelPyTFEagerPolicy(
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 179, in __init__
    policy = tf.compat.v2.saved_model.load(model_path)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 936, in load
    result = load_internal(export_dir, tags, options)["root"]
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 974, in load_internal
    loader = loader_cls(object_graph_proto, saved_model_proto, export_dir,
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 187, in __init__
    self._restore_checkpoint()
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 560, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1351, in restore
    object_graph_string = reader.get_tensor(base.OBJECT_GRAPH_PROTO_KEY)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 66, in get_tensor
    return CheckpointReader.CheckpointReader_GetTensor(
IndexError: Read less bytes than requested
  In call to configurable 'SavedModelPyTFEagerPolicy' (<class 'tf_agents.policies.py_tf_eager_policy.SavedModelPyTFEagerPolicy'>)

Process finished with exit code 1

I am using python 3.8.0 and the following packages:

(rt9) λ › pip list                                                                                      workspace/repos
Package                       Version
----------------------------- ---------
absl-py                       1.4.0
astunparse                    1.6.3
cachetools                    5.3.0
certifi                       2022.12.7
charset-normalizer            3.1.0
cloudpickle                   2.2.1
decorator                     5.1.1
dill                          0.3.6
dm-tree                       0.1.8
etils                         1.1.1
flatbuffers                   23.3.3
gast                          0.5.3
gin-config                    0.5.0
google-auth                   2.16.2
google-auth-oauthlib          0.4.6
google-pasta                  0.2.0
googleapis-common-protos      1.59.0
grpcio                        1.51.3
gym                           0.26.2
gym-notices                   0.0.8
h5py                          3.8.0
idna                          3.4
importlib-metadata            6.1.0
importlib-resources           5.12.0
keras                         2.8.0
Keras-Preprocessing           1.1.2
libclang                      15.0.6.1
Markdown                      3.4.1
MarkupSafe                    2.1.2
numpy                         1.24.2
oauthlib                      3.2.2
opt-einsum                    3.3.0
packaging                     23.0
Pillow                        9.4.0
pip                           23.0.1
promise                       2.3
protobuf                      3.19.6
pyasn1                        0.4.8
pyasn1-modules                0.2.8
requests                      2.28.2
requests-oauthlib             1.3.1
rsa                           4.9
setuptools                    65.6.3
six                           1.16.0
tensorboard                   2.8.0
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
tensorflow                    2.8.2
tensorflow-addons             0.17.1
tensorflow-datasets           4.6.0
tensorflow-estimator          2.8.0
tensorflow-hub                0.12.0
tensorflow-io-gcs-filesystem  0.26.0
tensorflow-metadata           1.9.0
tensorflow-model-optimization 0.7.2
tensorflow-probability        0.16.0
tensorflow-text               2.8.2
termcolor                     2.2.0
tf-agents                     0.12.0
toml                          0.10.2
tqdm                          4.65.0
typeguard                     3.0.1
typing_extensions             4.5.0
urllib3                       1.26.15
Werkzeug                      2.2.3
wheel                         0.38.4
wrapt                         1.15.0
zipp                          3.15.0
@oym1994
Copy link

oym1994 commented Apr 11, 2023

Hi,

I am trying to load the checkpoints. I have followed #11 and ran this code:

saved_path = './trained_checkpoints/rt1main'
from tf_agents.policies import py_tf_eager_policy

py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    model_path=saved_path,
    load_specs_from_pbtxt=True,
    use_tf_function=True,
)

But I am getting this error:

Traceback (most recent call last):
  File "/home/ali/workspace/repos/google-research/robotics_transformer/load_checkpoints.py", line 7, in <module>
    py_tf_eager_policy.SavedModelPyTFEagerPolicy(
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 179, in __init__
    policy = tf.compat.v2.saved_model.load(model_path)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 936, in load
    result = load_internal(export_dir, tags, options)["root"]
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 974, in load_internal
    loader = loader_cls(object_graph_proto, saved_model_proto, export_dir,
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 187, in __init__
    self._restore_checkpoint()
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 560, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1351, in restore
    object_graph_string = reader.get_tensor(base.OBJECT_GRAPH_PROTO_KEY)
  File "/home/ali/anaconda3/envs/rt9/lib/python3.8/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 66, in get_tensor
    return CheckpointReader.CheckpointReader_GetTensor(
IndexError: Read less bytes than requested
  In call to configurable 'SavedModelPyTFEagerPolicy' (<class 'tf_agents.policies.py_tf_eager_policy.SavedModelPyTFEagerPolicy'>)

Process finished with exit code 1

I am using python 3.8.0 and the following packages:

(rt9) λ › pip list                                                                                      workspace/repos
Package                       Version
----------------------------- ---------
absl-py                       1.4.0
astunparse                    1.6.3
cachetools                    5.3.0
certifi                       2022.12.7
charset-normalizer            3.1.0
cloudpickle                   2.2.1
decorator                     5.1.1
dill                          0.3.6
dm-tree                       0.1.8
etils                         1.1.1
flatbuffers                   23.3.3
gast                          0.5.3
gin-config                    0.5.0
google-auth                   2.16.2
google-auth-oauthlib          0.4.6
google-pasta                  0.2.0
googleapis-common-protos      1.59.0
grpcio                        1.51.3
gym                           0.26.2
gym-notices                   0.0.8
h5py                          3.8.0
idna                          3.4
importlib-metadata            6.1.0
importlib-resources           5.12.0
keras                         2.8.0
Keras-Preprocessing           1.1.2
libclang                      15.0.6.1
Markdown                      3.4.1
MarkupSafe                    2.1.2
numpy                         1.24.2
oauthlib                      3.2.2
opt-einsum                    3.3.0
packaging                     23.0
Pillow                        9.4.0
pip                           23.0.1
promise                       2.3
protobuf                      3.19.6
pyasn1                        0.4.8
pyasn1-modules                0.2.8
requests                      2.28.2
requests-oauthlib             1.3.1
rsa                           4.9
setuptools                    65.6.3
six                           1.16.0
tensorboard                   2.8.0
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
tensorflow                    2.8.2
tensorflow-addons             0.17.1
tensorflow-datasets           4.6.0
tensorflow-estimator          2.8.0
tensorflow-hub                0.12.0
tensorflow-io-gcs-filesystem  0.26.0
tensorflow-metadata           1.9.0
tensorflow-model-optimization 0.7.2
tensorflow-probability        0.16.0
tensorflow-text               2.8.2
termcolor                     2.2.0
tf-agents                     0.12.0
toml                          0.10.2
tqdm                          4.65.0
typeguard                     3.0.1
typing_extensions             4.5.0
urllib3                       1.26.15
Werkzeug                      2.2.3
wheel                         0.38.4
wrapt                         1.15.0
zipp                          3.15.0

Hi, have you solved this problem? I also get this error. It would be better if you could provide some solution or advice.

@AliBuildsAI
Copy link
Author

AliBuildsAI commented Apr 12, 2023

Hi, No I could not solve it.

@oym1994
Copy link

oym1994 commented May 4, 2023

Hi, No I could not solve it.

Problem has been solved! You need to download the repo by using "git lfs", instead of "git" or zip file.

@JoAnn0812
Copy link

Hi, No I could not solve it.

Problem has been solved! You need to download the repo by using "git lfs", instead of "git" or zip file.

Hi, could you please provide the full code for loading checkpoints? Many thanks!

@jaiber
Copy link

jaiber commented Nov 22, 2023

This is what I did:
$ sudo apt install git-lfs
$ git lfs pull

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants