Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

major revision of SSL4EO downloading script #2664

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

cmalbrec
Copy link

according to #2345

@github-actions github-actions bot added the scripts Training and evaluation scripts label Mar 22, 2025
@cmalbrec
Copy link
Author

@microsoft-github-policy-service agree

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more than just a refactor, this is a complete rewrite. Is this new script compatible with the command-line args used in experiments/ssl4eo/landsat/download_*.sh? I noticed you also removed the rate throttle, does this not suffer from exceeding the GEE API rate limit?

To properly review this I would probably need:

  • A list of changes made from the current version
  • Why these changes were made
  • Performance metrics before and after this change

It may be quicker and easier to simply fix the geolocation bug, I can't imagine it requires more than a few lines changed.

@cmalbrec
Copy link
Author

Hi @adamjstewart :

Yes, with @AABNassim we took the liberty to rewrite the download script to better organize for future development compared to the previous one. https://github.com/cmalbrec/torchgeo/blob/issue-2345/experiments/ssl4eo/download_ssl4eo.sh serves as an example on how to adapt experiments/ssl4eo/landsat/download_*.sh.

We successfully utilized the new downloading script to swiftly curate https://huggingface.co/datasets/embed2scale/SSL4EO-S12-v1.1 . I vote to move forward with the new version and add mods like rate throttling (@wangyi111 may help here), etc.

@adamjstewart
Copy link
Collaborator

Is your version based on the SSL4EO-S12 or SSL4EO-L download script? The latter underwent months of work to make it more generic to support both Sentinel and Landsat, with better overlap checking, nodata checking, and optimized downloading. I wouldn't want to see those months of work undone.

If you want to move forward with this version, I'll still need to see:

  • A list of changes made from the current version
  • Why these changes were made
  • Performance metrics before and after this change

P.S. I've had about a dozen people ask me when SSL4EO-S12 v1.1 will be added to TorchGeo so it is easier to download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scripts Training and evaluation scripts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants