Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build collecton/item metadata system #2

Merged
merged 9 commits into from
Nov 20, 2024
Merged

build collecton/item metadata system #2

merged 9 commits into from
Nov 20, 2024

Conversation

hrodmn
Copy link
Collaborator

@hrodmn hrodmn commented Nov 13, 2024

Objective: Build a stactools package for the GLAD Land Cover Land Use 2020 dataset.

Notes:

  • I split the data into two collections because the annual and change items would overlap in time and have different asset properties:
    • annual data products: glad-glclu2020
    • 2000-2020 change product: glad-glclu2020-change
  • We could do it in a single collection but users would need to apply a specific filter to avoid retrieving two items with every query!
  • The default setting is to generate metadata for the original assets but those are not COGs so be careful out there, folks.
  • Users can provide their own href_format to generate metadata for new copies of the assets (e.g. COGs in S3).

Checklist

  • Tests pass (run scripts/test)
  • Documentation has been updated to reflect changes, if applicable
  • Examples have been updated to reflect changes, if applicable
  • Changes are added to the CHANGELOG, if applicable

@hrodmn hrodmn self-assigned this Nov 13, 2024
@wildintellect
Copy link
Collaborator

Missing:

  • providers, that MAAP is the host (Processor/Host), also our copy is going to be COG (this is our processing), and that the original data is from the cited source (Producer/Licensor)
  • projection extension is in the items but not declared in the collection
  • Assets at the collection level don't have the data object specified only thumbnail.
  • Should we use the raster extension? not sure where else resolution is specified

@hrodmn
Copy link
Collaborator Author

hrodmn commented Nov 13, 2024

Thanks for the quick feedback @wildintellect.

  • providers, that MAAP is the host (Processor/Host), also our copy is going to be COG (this is our processing), and that the original data is from the cited source (Producer/Licensor)

I will add a provider entry for the Producer/Licensor. I wrote the package to build metadata for the original data which are not COGs and imagined that when we build metadata for the MAAP-hosted data we can add the Processor/Host entry for MAAP.

  • Assets at the collection level don't have the data object specified only thumbnail.

I filled out the item_assets section with the actual data asset information. I think the thumbnail is the only proper collection-level asset?

  • Should we use the raster extension? not sure where else resolution is specified

Yes, I'll add this! I didn't try it yet but maybe this is a chance to try STAC v1.1.0? pystac doesn't support 1.1 yet so I'll hold off stac-utils/pystac#1375

@wildintellect
Copy link
Collaborator

wildintellect commented Nov 13, 2024

@hrodmn

Thanks for the quick feedback @wildintellect.

  • providers, that MAAP is the host (Processor/Host), also our copy is going to be COG (this is our processing), and that the original data is from the cited source (Producer/Licensor)

I will add a provider entry for the Producer/Licensor. I wrote the package to build metadata for the original data which are not COGs and imagined that when we build metadata for the MAAP-hosted data we can add the Processor/Host entry for MAAP.

There should be an easy flag, arg to pass this info into the generation. I've done this before, let me try to recall the example https://github.com/stactools-packages/cop-dem/blob/852ed13abc95513e4113369004cfb6110bde0081/src/stactools/cop_dem/constants.py#L46-L57

  • Assets at the collection level don't have the data object specified only thumbnail.

I filled out the item_assets section with the actual data asset information. I think the thumbnail is the only proper collection-level asset?

But the collection is also supposed to show what assets are available on items (this might be the item-asset extension)

  • Should we use the raster extension? not sure where else resolution is specified

Yes, I'll add this! I didn't try it yet but maybe this is a chance to try STAC v1.1.0? pystac doesn't support 1.1 yet so I'll hold off stac-utils/pystac#1375

I also wasn't sure if pgstac supports 1.1 yet either...

@hrodmn
Copy link
Collaborator Author

hrodmn commented Nov 13, 2024

@wildintellect I added the providers and the extension projection and raster extension entries to the collection. The item-assets extension is handling the "common assets" description that you mentioned.

src/stactools/glad_glclu2020/metadata.py Outdated Show resolved Hide resolved
src/stactools/glad_glclu2020/metadata.py Outdated Show resolved Hide resolved
src/stactools/glad_glclu2020/metadata.py Outdated Show resolved Hide resolved
src/stactools/glad_glclu2020/metadata.py Outdated Show resolved Hide resolved
@hrodmn hrodmn merged commit 52b0922 into main Nov 20, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants