-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Stopped #4
Comments
Hi, thanks for your question. Can you provide more details for which command you executed so that we can reproduce the issue? Thanks! |
I found every_n_epochs: 1 and save_top_k: 2 can work, but every_n_epochs > 1 does not work. It seems when every_n_epochs > 1, no last.ckpt has been saved. JB |
So you are referring to the model checkpointing, right? What behavior are you trying to achieve? |
It seems this is an issue from pytorch lightning. When save_last = True, every_n_epochs cannot be larger than 1. Lightning-trainable sets save_last = True, so every_n_epochs = 5 does not work. JB |
Also the training stopped after 2 epoches with the error
FileNotFoundError: [Errno 2] No such file or directory: "lightning_logs\'mnist'\version_1\checkpoints\last.ckpt"
Not sure why last.ckpt was not saved?
Thanks
JB
The text was updated successfully, but these errors were encountered: