Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEP 331: STAC layers and data providers #331

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

02JanDal
Copy link

This QEP documents a proposed addition to the STAC support - adding STAC endpoints as layers.

There are some points here were I'd appreciate input, especially regarding the contents of the STAC layers subheader. From the PoV of this specific feature it would be beneficial to be able to restrict what the user can change about a layer (instead letting the data source control it). This could technically be solved relatively easily (for example by passing through a "read-only" flag from the data source which the layer properties dialog uses to hide pages), but I can image that this might violate some unspoken architectural rule. Or maybe not?

Provided that this QEP is accepted we'll be to implement at least the first part (adding a layer showing the extents of the STAC items) with the funding we've currently got, hopefully also preview of raster and/or point clouds.

@02JanDal 02JanDal changed the title QEP: STAC layers and data providers QEP 331: STAC layers and data providers Feb 24, 2025
@nyalldawson
Copy link
Contributor

I have some concerns here.

Can you include links to the stac vector features specification, and describe how this relates to WFS/OAPIF? Creating a new data provider should be a last resort, so if the stac specifications are closely aligned with WFS/OAPIF then I think making a separate provider is a mis-step.

Similarly, creating QgsStacRasterDataProvider is a red-flag to me. The GDAL library and data provider are already mature and written to handle optimised reading of remote data sources, and it would be a massive undertaking and large technical debt to reimplement all that just for stac resources. Besides, GDAL itself already handles stac resources well, so I fail to see why we'd need a dedicated provider in the first place.

"thus be a good idea to have either separate layer classes for STAC"

No, this is not a good idea - creating a new layer class just for a single data provider breaks a lot of the assumptions and designs of QGIS API. The better approach is to add the missing bits to the API for the existing layer classes.

Was this proposal created in partnership/consultation with Lutra, who did the original stac implementation?

@nyalldawson
Copy link
Contributor

nyalldawson commented Feb 25, 2025

@02JanDal following https://github.com/qgis/QGIS-Enhancement-Proposals?tab=readme-ov-file#process-and-policies, can you please announce this also to QGIS-psc lists?

@nyalldawson nyalldawson added In Discussion QEPs currently in discussion stage Project A proposal which concerns a project, eg new functionality labels Feb 25, 2025
@sweco-sedalh
Copy link

sweco-sedalh commented Feb 25, 2025

(EDIT: posted from the wrong account, I'm the same as @02JanDal)

can you please announce this also to QGIS-psc lists?

Done, that is however currently not documented on the linked page (only QGIS-developers and potentially QGIS-users)?

Can you include links to the stac vector features specification, and describe how this relates to WFS/OAPIF? Creating a new data provider should be a last resort, so if the stac specifications are closely aligned with WFS/OAPIF then I think making a separate provider is a mis-step.

There is a bit of historical cruft involved here, as STAC was developed along-side OAPIF, rather than based on it. But basically while STAC borrows a lot of ideas from OAPIF (and vice-versa) and a lot of STAC endpoints implement OAPIF, there is no guarantee for this to be the case.

Specification wise STAC has split of OAPIF-compability into a separate part: https://github.com/radiantearth/stac-api-spec/tree/v1.0.0/ogcapi-features

The /search endpoint, specified in STAC API - Item Search, which would be quite central to the implementation of this feature, has a lot of overlap with OAPIF endpoints (such as both returning GeoJSON FeatureCollections, and using the same name for many parameters) but I'd say still is distinct from it.

So basically we could solve parts of this using the existing OAPIF data provider, but likely not in a way that supports Item Search and it would not be guaranteed to work with all STAC endpoints.

Similarly, creating QgsStacRasterDataProvider is a red-flag to me. The GDAL library and data provider are already mature and written to handle optimised reading of remote data sources, and it would be a massive undertaking and large technical debt to reimplement all that just for stac resources. Besides, GDAL itself already handles stac resources well, so I fail to see why we'd need a dedicated provider in the first place.

That's a good point. Based on what I can tell the STACIT driver could be sufficient here. That would leave the questions of how to handle the temporal controller (probably would make sense to add support to it to the GDAL data provider/the STACIT driver) and authentication (AFAICT the GDAL driver does not support OAuth2 right now, right?).

Regarding authentication; I got an interesting tip based on https://lists.osgeo.org/pipermail/qgis-developer/2025-February/067392.html that PDAL is considering to move to GDAL's VSI. A possible solution to both PDAL algorithms and authentication here could then be to register a custom GDAL VSI (something like /vsiqgis/) which would send requests through QGIS' network plumbing for authentication, logging, proxying, etc.

"thus be a good idea to have either separate layer classes for STAC"

No, this is not a good idea - creating a new layer class just for a single data provider breaks a lot of the assumptions and designs of QGIS API. The better approach is to add the missing bits to the API for the existing layer classes.

I suspected as much. The other option (I can think of) would then be to let the data provider provide flags on stuff that the user may change, with the default being "everything" for backwards compatibility. For this use-case it'd probably be about leaving Information and Symbology/Labels/Masks/3D View/Diagrams editable. UI wise I think graying out the tabs, making the fields readonly and displaying some sort of "managed by data provider" message would be the best option?

Actions and initial styling can be set when adding the layer from the STAC tree, so as long as actions aren't overridable it should be fine (symbology I think should be possible to override, it should just have some sensible default).

Was this proposal created in partnership/consultation with Lutra, who did the original stac implementation?

I messaged @uclaros specifically when I sent the announcement to QGIS-Developer. I haven't had any further contact, but I hope they'll give their input here.

@wonder-sk
Copy link
Member

I have similar concerns as @nyalldawson (btw. we have not discussed the proposal with Jan before)

I think it would be good to step back from the design and clarify the requirements first - the proposal mostly focuses on HOW things should be done, but less so about what are the end goals... My understanding is that you have a STAC server with some collections and you'd like to browse such collections on the map (with the help of temporal controller).

I fully agree with Nyall that creation of a new data provider should be the last resort because that requires a lot of new code (and thus maintenance cost). You could probably easily use just a vector layer with memory provider and populate it on the fly.

For point clouds, virtual point cloud data provider should already be able to handle nearly everything you want - show bounding boxes, show tile names, show preview point cloud etc, and the .vpc format is already using STAC items - what else is missing?

For raster data, GDAL already has VRT support and STAC support... maybe an interesting addition could be that raster layers from VRT/STAC driver could be rendered using just bounding boxes rather than real raster data, similarly to how virtual point clouds can be rendered. Maybe adding such raster layer renderer would be enough for your use case?

By the way, how would you like to handle pagination when dealing with STAC search results? Some collections can be HUGE - e.g. millions of items - and it is not feasible to fetch everything.

@sweco-sedalh
Copy link

I think it would be good to step back from the design and clarify the requirements first - the proposal mostly focuses on HOW things should be done, but less so about what are the end goals... My understanding is that you have a STAC server with some collections and you'd like to browse such collections on the map (with the help of temporal controller).

Probably a bit more context then necessary, but to try to not leave anything out:

The national land survey of Sweden, Lantmäteriet, provide a platform via which you can access various datasets from various providers (municipalities mainly) which uses STAC/OAPIF (with COG and COPC for raster and point cloud data sets respectively). Additionally, they now provide their EU High Value Data-datasets via STAC (with either COG, COPC or GeoPackage). We now have a project where we, among a few other things, are to help them improve how consumers can access these resources using QGIS. The project is budget bound rather than task bound, so we'll try to get as much done as possible within the given budget, but the things we've so far identified would be useful:

  • Support for OAuth2 Client Credential flow (all of Lantmäteriets APIs are behind OAuth2, and the way they have it set up it would be significantly easier for users if they can use Client Credential (and yes, I agree, this is technically the wrong flow for a client application)) - Support for OAuth 2 Client Credentials flow QGIS#60534
  • Easier way to add services (wizard like flow) so that users don't have to juggle URLs etc. - will be implemented in a plugin
  • Support for mixed geometry types - Support for mixed-geometry-type layers #298 (was part of an earlier proposal, however it has since been pushed down on the list of priorities but we hope to revisit it in the future)
  • Support for OAuth2-authenticated COPCs in the PDAL based processing algorithms - https://lists.osgeo.org/pipermail/qgis-developer/2025-February/067392.html
  • More visual/map-based access to resources in a STAC endpoint, while still keeping "where your existing data is", i.e. in QGIS
  • Using STAC data as background maps

The last two are the ones discussed here. The first is mostly about "discovery" (users wanting to just get a glimpse of what's available, in order to later be able to work with the data available, sure, you can point them at documentation etc. but at the end of the day that's not where most users will look...) and picking stuff from the map. Additionally, while the data browser dialog based search works alright especially for datasets with useful names (which is the case for most vector data from Lantmäteriet, which is made available as one GeoPackage per municipality), it's not quite the same usability for other types of data (such as orthophotos, which often have a name based on the coordinate or some index).

As an example, say I need orthophotos over Stockholm. I'll add the STAC endpoint, open the data dialog, set the filter to current map extent and expected temporal extent (which requires me to know when the latest images where taken over the area in question). Then I need to scroll down and individually add each orthophoto as an individual layer to the map.
image

Now, there are other options here. We could build a processing algorithm which, given an extent, adds either a group layer with individual COGs or a single layer with the STACIT driver. But that would require us to either pull in an external STAC library, implement our own STAC client, or make the QgsStac* classes available to Python (which might make sense anyway, but they aren't right now at least). But it wouldn't solve the "discovery" aspect of "what data is available when" as the user would still need to guess which temporal filtering to use.

The only real solution to that which I've been able to think of (but I'd be happy to be proven wrong) is by adding the STAC endpoint as a layer. Then the user could identify-click to find time periods for orthophotos at a given point, or use the slider in the temporal controller or similar.

For background maps, there are fewer options. Again, we could add a processing algorithm that adds a STACIT-backend layer over the whole STAC endpoint, but we'd need some code in either the layer or the data provider which updates the URL based on the filtered temporal extent. The STACIT driver also doesn't have the ability to show extents when further zoomed out.

More in general, I think browsing STAC endpoints on the map, integrated with QGIS existing functionality such as the temporal controller, is the most natural way for a user to work with it. It's spatial data after all, so shouldn't it be on the map?

I fully agree with Nyall that creation of a new data provider should be the last resort because that requires a lot of new code (and thus maintenance cost). You could probably easily use just a vector layer with memory provider and populate it on the fly.

I don't think I quite agree with "a lot of new code". Sure, some, however the data provider needed here doesn't need to be particularly complex. Most of the actual fetching would be delegated to the existing STAC-related classes, and there is no editing or similar more complex functionality involved.

Memory provider - sure, it would be a possible solution, however the "populate on the fly" part would get ugly IMO. We'd have to listen to changes in the map extent etc. and pretty much circumvent the entire code path for fetching data on the fly. Meanwhile, data providers/layers have existing path for populating on the fly, as used by pretty much every data provider. The amount of code would most likely be higher and more complex.

For point clouds, virtual point cloud data provider should already be able to handle nearly everything you want - show bounding boxes, show tile names, show preview point cloud etc, and the .vpc format is already using STAC items - what else is missing?

Same issue as with memory providers, we'd basically need to change the VPC on the fly using logic circumventing the "usual" way this is handled. Or we could just create the full VPC file upfront, but as you yourself point out further down we'd potentially have to fetch quite a lot of data, most of which would never be used because the user isn't looking at it.

But as described in the original proposal the VPC provider is definitely relevant. But rather by re-using functionality (potentially re-factoring a bit to let the two data providers share functionality) than using it straight out (an option would be to adapt the VPC data provider to let itself fetch new items on-the-fly, which would be less code overall but make that specific data provider more complex. But I'm open to that option).

For raster data, GDAL already has VRT support and STAC support... maybe an interesting addition could be that raster layers from VRT/STAC driver could be rendered using just bounding boxes rather than real raster data, similarly to how virtual point clouds can be rendered. Maybe adding such raster layer renderer would be enough for your use case?

Same issue with VRT as with VPC. As mentioned above the STACIT-driver can likely solve a lot here, but we'd have to find a way to handle authenticated endpoints (such as a custom VSI) and update the STACIT parameters/URL on the fly based on temporal controller and other filters. Happy for any suggestions regarding that.

By the way, how would you like to handle pagination when dealing with STAC search results? Some collections can be HUGE - e.g. millions of items - and it is not feasible to fetch everything.

Same as OAPIF today - continue fetching pages until everything in view is rendered. It's not ideal of course, but as with OAPIF there isn't really a better option (maybe fetching a fixed amount and showing a "there's more data if you zoom in" message to the user I guess).

@wonder-sk
Copy link
Member

Thanks a lot Jan for taking time to provide the extra context - it is now much easier to understand your intentions!

I also appreciate that you would like to improve UX of the STAC client in QGIS - there's a lot of untapped potential still, we just need to find the most suitable way how to do things :-)

In the next step, it would be good to find an agreement about what new features to offer to the user (ignoring any implementation details), and only after that let's think of how to implement such features.

My thoughts about STAC client UX improvements:

  • it would be useful to automatically fetch more and more items from the search, so that user does not need to manually scroll the list to get more items - especially because some servers return only few items at once. We would need some kind of limit though (e.g. max. 1000 items fetch automatically), to prevent QGIS doing many hundreds/thousands of requests. (Technically, search API is the bottleneck here to provide good UX... I quite like the idea of stac-geoparquet - with it, we could download metadata of the whole collection once, and then do the item search much better/faster locally.)
  • it would be indeed nice if search results could be updated as you move the map canvas - so that user does not need to fiddle with search filters. Probably it could be a mode you could turn on/off
  • temporal controller integration - I am not sure if it is a good idea to use temporal controller to set temporal filter for STAC search. This means the STAC client would have to "take over" the temporal controller and its configuration, which would be unwanted especially if the user is already using it for something else within the project. I guess we want "something like temporal controller" - a widget that would show timeline, but it would be specific to STAC... for example, I would love to see a tiny histogram to get a better indication of the distribution of data in time, potentially an option to zoom in/out within the time dimension, to focus just on some interval.

My understanding is that on top of the general UX improvements to STAC mentioned above, you would like to improve STAC support for COGs and COPCs in particular - to be able to stream the actual data to map canvas rather than looking just at the bounding boxes - ideally by having a map layer that would simply point to a STAC collection, and it would handle search for items in that collection and their display in map canvas - right? And I guess it is expected such a layer could be also saved in a QGIS project file?

For this kind of functionality - in terms of technical approach I would suggest:

  • for point clouds - virtual point cloud data provider could be extended to accept a link to a STAC collection, and use QGIS STAC code to fetch items and dynamically add them to the list of files to display
  • for rasters - probably virtual raster data provider could be extended (or a new "proxy" raster data provider could be created) to handle search of items, and it would proxy raster I/O calls to the underlying GDAL provider with VRT or STACIT driver

@nyalldawson
Copy link
Contributor

@02JanDal

handle authenticated endpoints (such as a custom VSI)

I think this should be left as a completely separate project. It belongs in GDAL itself, where there's already partial support for replacing the networking stack with a custom one. QGIS does this already, but GDAL doesn't use the custom functions everywhere -- eg its not used in the vsi code currently. If that were done then GDAL requests would just go through QGIS' standard network access manager, and all the authentication/proxy/... would "just work".

@sweco-sedalh
Copy link

it would be useful to automatically fetch more and more items from the search, so that user does not need to manually scroll the list to get more items - especially because some servers return only few items at once. We would need some kind of limit though (e.g. max. 1000 items fetch automatically), to prevent QGIS doing many hundreds/thousands of requests. (Technically, search API is the bottleneck here to provide good UX... I quite like the idea of stac-geoparquet - with it, we could download metadata of the whole collection once, and then do the item search much better/faster locally.)

That would be useful in general, though only partially (for one of three catalogs) in our case. Also interesting would be the ability to search in the browser (either inline in the tree to filter a specific collection or for the entire browser).

To be a bit more specific, one catalog contains collections (one per data type, such as buildings) containing one item per municipality (290 items). The most common usecase here is for a user to want to get to the item from a specific catalog for a specific municipality. Scrolling in an alphabetically sorted list works, filtering would be even nicer.

The other two catalogs are more "traditional" STAC catalogs, they contain collections of orthophotos/DEMs. In them the names of the items are close to useless (even the names of the collections are of questionable use), so neither scrolling nor filtering by name is of much use:
image

it would be indeed nice if search results could be updated as you move the map canvas - so that user does not need to fiddle with search filters. Probably it could be a mode you could turn on/off

I guess turning on/off -> hiding/showing a layer?

temporal controller integration - I am not sure if it is a good idea to use temporal controller to set temporal filter for STAC search. This means the STAC client would have to "take over" the temporal controller and its configuration, which would be unwanted especially if the user is already using it for something else within the project. I guess we want "something like temporal controller" - a widget that would show timeline, but it would be specific to STAC... for example, I would love to see a tiny histogram to get a better indication of the distribution of data in time, potentially an option to zoom in/out within the time dimension, to focus just on some interval.

I'm not quite sure I follow here, why would the STAC client have to "take over" the temporal controller? If the map layer of STAC items is a normal layer with a data provider (new or existing) then it'd work just as it usually does?

My understanding is that on top of the general UX improvements to STAC mentioned above, you would like to improve STAC support for COGs and COPCs in particular - to be able to stream the actual data to map canvas rather than looking just at the bounding boxes - ideally by having a map layer that would simply point to a STAC collection, and it would handle search for items in that collection and their display in map canvas - right? And I guess it is expected such a layer could be also saved in a QGIS project file?

Correct!

Yes, the most natural thing would be for the layer to be saved with the project (otherwise it'd be confusing for the user why that particular layer disappears after a restart/reopening).

for point clouds - virtual point cloud data provider could be extended to accept a link to a STAC collection, and use QGIS STAC code to fetch items and dynamically add them to the list of files to display

I'm a bit concerned that this would significantly complicate the VPC provider (which is completely static today). On the other hand, in the end the VPC provider would likely just have a subset of the functionality of the STAC-pointcloud-provider, so it might make sense to adapt the VPC-provider directly instead.

I think this should be left as a completely separate project. It belongs in GDAL itself, where there's already partial support for replacing the networking stack with a custom one. QGIS does this already, but GDAL doesn't use the custom functions everywhere -- eg its not used in the vsi code currently. If that were done then GDAL requests would just go through QGIS' standard network access manager, and all the authentication/proxy/... would "just work".

Could you point me in the direction of where QGIS already does this, my code searching fails me for that it seems...

And yes, I agree, this would be a separate project/proposal.


Thought some more about this over the weekend - originally my intention was to support any STAC catalog even if they don't support STAC API/search. However, I'm starting to doubt that; on the one hand most catalogs large enough for this all to be worth it do support STAC API, and the technical tradeoffs aren't quite worth it. Additionally, I realized that the data source manager also requires STAC API so having the same limitation here wouldn't be to bad.

This would mean that we're a lot closer to the OAPIF data provider already, as STAC API Features (to get items from a single collection) and STAC API Item Search both build on top of OGC API Features.

For item extents that would mean allowing the data provider to use a custom endpoint (rather /search instead of /collections/{id}/items). The parameters (at least for core) should be identical (such as bbox, limit, etc.). We might have to look at if there are any conflicts when using extensions though, although I'm hopeful that this should be fine. Based on just a quick look it looks like it might be possible to completely contain this change to QgsOapifProvider::init(). An interesting question would be how to encode the fact that we use the /search endpoint instead of a collection-specific items endpoint, the quickest solution might be to use a hardcoded value for typeName (such as $stac-search$). Any suggestions?

Does this sound like a good way forward?

Next would then be the question of how to expose the additional information from STAC, namely the assets. There is already separate parsing of OAPIF-specific properties happening so that would be a reasonable place to also parse assets. But how should that then be "attached" to the feature? Adding it as a separate field/attribute with a custom type? Adding it as a sibling to attributes on the feature (similar to how it's organized on the STAC item)?

Given assets on features in general support for that can then be added to the identifier tool etc., which could also be useful for other data providers (can right now only think of ArcGIS Server and their attachments though).

An interesting option here would then be to add a a renderer that uses the assets; essentially giving a live preview of the raster/point cloud. As opposed to a STACIT/VPC-based solution that layer couldn't be used in point cloud/raster consuming algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
In Discussion QEPs currently in discussion stage Project A proposal which concerns a project, eg new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants