Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resume training based on restored checkpoint #10

Open
researchyw20 opened this issue Sep 6, 2023 · 0 comments
Open

resume training based on restored checkpoint #10

researchyw20 opened this issue Sep 6, 2023 · 0 comments

Comments

@researchyw20
Copy link

I was trying to figure out how to resume training based on a restored checkpoint with run_ray_train.py. Specifically:

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms.algorithm import Algorithm
from ray.tune import registry
from baselines.train import make_envs


ray.init(local_mode=True, ignore_reinit_error=True)
registry.register_env("meltingpot", make_envs.env_creator)

## train mode, two failed attempts
my_ppo_config = PPOConfig().environment("meltingpot")
my_ppo = my_ppo_config.build()

# method1: fail at .build stage
PPOConfig().environment("meltingpot").build().restore(checkpoint_dir)

# method2: failed at .train stage
Algorithm.from_checkpoint(checkpoint_dir).train()

I came across KeyError, details shown as below:

ray::RolloutWorker.__init__() (pid=180001, ip=10.0.0.182, actor_id=17cb813ab79e0c981feebd6e01000000, repr=<ray.rllib.evaluation.rollout_worker._modify_class.<locals>.Class object at 0x7f6b2de58850>)
  File "anaconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 397, in __init__
    self.env = env_creator(copy.deepcopy(self.env_context))
  File "/home/researchyw20/meltingpot/code/Melting-Pot-Contest-2023/baselines/train/make_envs.py", line 10, in env_creator
    env = substrate.build(env_config['substrate'], roles=env_config['roles'])
  File "anaconda3/envs/mpc_main/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 909, in __getitem__
    raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'substrate'"

Any help on this is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant