Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HELP!How can I retrieve the best_model? #2651

Closed
bigbox12138 opened this issue Mar 18, 2025 · 9 comments · Fixed by #2658
Closed

HELP!How can I retrieve the best_model? #2651

bigbox12138 opened this issue Mar 18, 2025 · 9 comments · Fixed by #2658
Labels
documentation Improvements or additions to documentation

Comments

@bigbox12138
Copy link

Issue

How can I retrieve the best_model? Where are the models saved after each training session? Additionally, how do I obtain the best-performing model from this training run?

Fix

No response

@bigbox12138 bigbox12138 added the documentation Improvements or additions to documentation label Mar 18, 2025
@adamjstewart
Copy link
Collaborator

This is more of a PyTorch Lightning question than a TorchGeo question.

If you are using the Python interface, you can use:

trainer.test(model=model, datamodule=datamodule, ckpt_path='best')

See the Trainer docs for more details.

If you are using the command-line interface, I actually don't know of an automatic way to obtain the best-performing model.

The models are saved in default_root_dir, which defaults to the current directory. You should find a lightning_logs/version_*/checkpoints/*.ckpt file with the checkpoint itself.

@bigbox12138
Copy link
Author

Thank you so much for your answer—it was incredibly helpful! I'm still a beginner in this field, so to clarify: does using YAML config files for training mean I cannot retrieve the best_model? If that’s the case, I’ll find it very challenging, as there might be overfitting during training, and I wouldn’t even realize it.

@adamjstewart
Copy link
Collaborator

adamjstewart commented Mar 18, 2025

You can, it's just not automatic. I would recommend using something like TensorBoard to view all training runs and select the best model that seems to avoid overfitting. The filename in TensorBoard can then be found on your filesystem.

To be honest, TorchGeo is not as easy to use as I would like if you aren't already familiar with PyTorch/torchvision/Lightning. If you are a beginner, I would highly recommend reading the documentation for those libraries, as TorchGeo basically builds on top of PyTorch/Lightning with a similar API as torchvision. Of course, that's no excuse, and I hope to improve the docs in the future. If you find anything unclear and would like to submit a PR to add additional hints to the docs, I would be happy to review! At the very least, you could add ckpt_path='best' to the Python interface docs.

@bigbox12138
Copy link
Author

Thank you so much for your prompt responses! Your guidance is incredibly valuable as I navigate learning deep learning. I’ve encountered numerous challenges while using TorchGeo, but there’s no one around me to provide timely and accurate answers. Once again, I deeply appreciate your support!

@adamjstewart
Copy link
Collaborator

You can also join our Slack workspace, there are hundreds of other TorchGeo users who can answer your questions and an entire #help channel where people ask questions like this.

@bigbox12138
Copy link
Author

Image
Thank you, but it seems that registrations from users in my country are not accepted.

@adamjstewart
Copy link
Collaborator

Oof, the Great (Fire)Wall of China strikes again. I would definitely not recommend using a VPN to make an account 😉. It seems like creating a new account is blocked, but using an existing account is not?

@adamjstewart adamjstewart added this to the 0.7.0 milestone Mar 18, 2025
@bigbox12138
Copy link
Author

Thank you for your guidance! If possible, I’d like to ask one more question:

1.My dataset format is similar to LandcoverAI, so I chose the LandcoverAI dataset format for training.
2.However, my images have 4 channels (one more than LandcoverAI).
3.I modified in_channels: 4 in the YAML file, but I still get an error during training(TypeError: Not a color or gray tensor. Got: <class 'torch.Tensor'>. Image should be an RGB or gray image).

This is the issue I’m facing. It might be a bit tedious, but I sincerely hope you can guide me on how to fix this and successfully train my model.

@adamjstewart
Copy link
Collaborator

What does your YAML file look like? Can you share the full stack trace from your error message? That error doesn't appear to come from TorchGeo, so I don't know how to reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants