HELP!How can I retrieve the best_model? #2651

bigbox12138 · 2025-03-18T03:17:00Z

Issue

How can I retrieve the best_model? Where are the models saved after each training session? Additionally, how do I obtain the best-performing model from this training run?

Fix

No response

adamjstewart · 2025-03-18T09:17:25Z

This is more of a PyTorch Lightning question than a TorchGeo question.

If you are using the Python interface, you can use:

trainer.test(model=model, datamodule=datamodule, ckpt_path='best')

See the Trainer docs for more details.

If you are using the command-line interface, I actually don't know of an automatic way to obtain the best-performing model.

The models are saved in default_root_dir, which defaults to the current directory. You should find a lightning_logs/version_*/checkpoints/*.ckpt file with the checkpoint itself.

bigbox12138 · 2025-03-18T09:22:41Z

Thank you so much for your answer—it was incredibly helpful! I'm still a beginner in this field, so to clarify: does using YAML config files for training mean I cannot retrieve the best_model? If that’s the case, I’ll find it very challenging, as there might be overfitting during training, and I wouldn’t even realize it.

adamjstewart · 2025-03-18T09:36:23Z

You can, it's just not automatic. I would recommend using something like TensorBoard to view all training runs and select the best model that seems to avoid overfitting. The filename in TensorBoard can then be found on your filesystem.

To be honest, TorchGeo is not as easy to use as I would like if you aren't already familiar with PyTorch/torchvision/Lightning. If you are a beginner, I would highly recommend reading the documentation for those libraries, as TorchGeo basically builds on top of PyTorch/Lightning with a similar API as torchvision. Of course, that's no excuse, and I hope to improve the docs in the future. If you find anything unclear and would like to submit a PR to add additional hints to the docs, I would be happy to review! At the very least, you could add ckpt_path='best' to the Python interface docs.

bigbox12138 · 2025-03-18T09:42:58Z

Thank you so much for your prompt responses! Your guidance is incredibly valuable as I navigate learning deep learning. I’ve encountered numerous challenges while using TorchGeo, but there’s no one around me to provide timely and accurate answers. Once again, I deeply appreciate your support!

adamjstewart · 2025-03-18T09:46:34Z

You can also join our Slack workspace, there are hundreds of other TorchGeo users who can answer your questions and an entire #help channel where people ask questions like this.

bigbox12138 · 2025-03-18T10:01:23Z

Thank you, but it seems that registrations from users in my country are not accepted.

adamjstewart · 2025-03-18T10:22:19Z

Oof, the Great (Fire)Wall of China strikes again. I would definitely not recommend using a VPN to make an account 😉. It seems like creating a new account is blocked, but using an existing account is not?

bigbox12138 · 2025-03-19T02:50:01Z

Thank you for your guidance! If possible, I’d like to ask one more question:

1.My dataset format is similar to LandcoverAI, so I chose the LandcoverAI dataset format for training.
2.However, my images have 4 channels (one more than LandcoverAI).
3.I modified in_channels: 4 in the YAML file, but I still get an error during training(TypeError: Not a color or gray tensor. Got: <class 'torch.Tensor'>. Image should be an RGB or gray image).

This is the issue I’m facing. It might be a bit tedious, but I sincerely hope you can guide me on how to fix this and successfully train my model.

adamjstewart · 2025-03-19T10:01:57Z

What does your YAML file look like? Can you share the full stack trace from your error message? That error doesn't appear to come from TorchGeo, so I don't know how to reproduce it.

bigbox12138 added the documentation Improvements or additions to documentation label Mar 18, 2025

adamjstewart added this to the 0.7.0 milestone Mar 18, 2025

adamjstewart mentioned this issue Mar 20, 2025

Trainers tutorial: document where checkpoints are saved, how to get best model #2658

Merged

adamjstewart removed this from the 0.7.0 milestone Mar 20, 2025

adamjstewart closed this as completed in #2658 Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HELP!How can I retrieve the best_model? #2651

HELP!How can I retrieve the best_model? #2651

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025 •

edited

Loading

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025

bigbox12138 commented Mar 19, 2025

adamjstewart commented Mar 19, 2025

HELP!How can I retrieve the best_model? #2651

HELP!How can I retrieve the best_model? #2651

Comments

bigbox12138 commented Mar 18, 2025

Issue

Fix

adamjstewart commented Mar 18, 2025

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025 • edited Loading

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025

bigbox12138 commented Mar 18, 2025

adamjstewart commented Mar 18, 2025

bigbox12138 commented Mar 19, 2025

adamjstewart commented Mar 19, 2025

adamjstewart commented Mar 18, 2025 •

edited

Loading