Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Copernicus-FM #2646

Merged
merged 32 commits into from
Mar 26, 2025
Merged

Add Copernicus-FM #2646

merged 32 commits into from
Mar 26, 2025

Conversation

wangyi111
Copy link
Contributor

@wangyi111 wangyi111 commented Mar 14, 2025

Add Copernicus-FM, an extension of the DOFA foundation model, able to process any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding.

Key features:

  • A unified model for both spectral and non-spectral modalities -- dynamic hypernetworks with Fourier / language encoding
  • Efficient processing of any spatial resolution -- adaptive patch embedding kernel size
  • Flexible metadata integration -- Fourier encoding with learnable meta tokens for geolocation, scale and time

References

@github-actions github-actions bot added documentation Improvements or additions to documentation models Models and pretrained weights labels Mar 14, 2025
@adamjstewart adamjstewart added this to the 0.7.0 milestone Mar 15, 2025
@github-actions github-actions bot added the testing Continuous integration testing label Mar 18, 2025
Args:
x: Input mini-batch.
meta_info: Longitudes, latitudes, times, and areas of each patch.
Use NaN for unknown metadata.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unintuitive UI. I would rather have separate values for each which are either Tensor or None. It's also a shame that we can't mix this in a single mini-batch, if a single value is NaN that metadata is ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make it possible to mix in the batch in principle, but needs looping over the batch dim to assign known/unknown, probably will change a lot of codes.

@adamjstewart adamjstewart marked this pull request as ready for review March 19, 2025 12:50
adamjstewart
adamjstewart previously approved these changes Mar 19, 2025
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready from my side, but I'll give others a couple days to review. I'm particularly concerned about whether the documentation is sufficient for people to figure out how to use the model. There are ways we could make this more user-friendly, but I don't want to diverge too much from the original source code.

@calebrob6
Copy link
Member

Just read through this and found myself wanting an example of how to use it (same thing I hit previously when trying to use Scale-MAE)

E.g. even though the args in forward are documented -- what do they mean:

image

Maybe it'd be nice to put an example in the docstring? (This also applies to other pre-trained models actually)

@adamjstewart
Copy link
Collaborator

Agreed, we actually got a similar request for DOFA: zhu-xlab/DOFA#14

These newer models (Copernicus-FM, Panopticon) add a lot more (optional) metadata, so are even more confusing to use. Not sure if this should be API documentation or tutorials or what. I probably don't have a ton of time to work on this personally but @wangyi111 might.

@wangyi111
Copy link
Contributor Author

Agreed, we actually got a similar request for DOFA: zhu-xlab/DOFA#14

These newer models (Copernicus-FM, Panopticon) add a lot more (optional) metadata, so are even more confusing to use. Not sure if this should be API documentation or tutorials or what. I probably don't have a ton of time to work on this personally but @wangyi111 might.

Should be easy for me to add the docstring. Regarding tutorial, is there such place for demonstrating a pretrained model? I only see https://torchgeo.readthedocs.io/en/stable/tutorials/pretrained_weights.html

@adamjstewart
Copy link
Collaborator

Yep, that's the right location. We could either expand that tutorial to cover additional models, or add a second tutorial specifically for using FMs.

adamjstewart
adamjstewart previously approved these changes Mar 20, 2025
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote we merge this as is and @wangyi111 can open a separate PR to expand our tutorials on Scale-MAE, DOFA, Copernicus-FM, etc. Any objections?

@wangyi111
Copy link
Contributor Author

I vote we merge this as is and @wangyi111 can open a separate PR to expand our tutorials on Scale-MAE, DOFA, Copernicus-FM, etc. Any objections?

oh I just added some docstring to copernicusfm class

@adamjstewart
Copy link
Collaborator

How would you feel about renaming a few things for consistency:

  • img_feat -> image or x
  • meta_info -> metadata
  • wave_list, wvs -> wavelengths
  • wv_planes -> wavelength_dim
  • bandwidth -> bandwidths
  • hypernet -> input_mode

Could also split meta_info into 4 separate variables for ease of use. Don't want to diverge too much from the original implementation, but also want to make it user friendly and intuitive.

@wangyi111
Copy link
Contributor Author

How would you feel about renaming a few things for consistency:

  • img_feat -> image or x
  • meta_info -> metadata
  • wave_list, wvs -> wavelengths
  • wv_planes -> wavelength_dim
  • bandwidth -> bandwidths
  • hypernet -> input_mode

Could also split meta_info into 4 separate variables for ease of use. Don't want to diverge too much from the original implementation, but also want to make it user friendly and intuitive.

Good for me. Only one is wv_plane, which is not only the dim of wavelength but also bandwidth and language embed. maybe something like hyper_dim? i kind of wanted to call it meta_dim but metadata also means another thing in this model..

@adamjstewart
Copy link
Collaborator

Maybe in_dim or in_features?

@wangyi111
Copy link
Contributor Author

these can still lead to input image features, maybe hyper_planes?

@adamjstewart
Copy link
Collaborator

Finished renaming. Remaining ideas to improve usabillity:

  • Could remove input_mode and key it based on whether wavelengths/bandwidths or language_embed is provided
  • Could use type hints to make it clear that wavelengths/bandwidths or language_embed is required, not both nor neither
  • Could split metadata into lat/lon/time/area, makes it easier to skip certain variables

Don't want to spend too much time on this because we still need to finish Copernicus-Bench and Copernicus-Pretrain, but once it's merged it makes it harder to change without breaking backwards compatibility.

adamjstewart
adamjstewart previously approved these changes Mar 24, 2025
@adamjstewart adamjstewart merged commit 81f8c0f into microsoft:main Mar 26, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation models Models and pretrained weights testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants