-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QEP 331: STAC layers and data providers #331
base: master
Are you sure you want to change the base?
Conversation
I have some concerns here. Can you include links to the stac vector features specification, and describe how this relates to WFS/OAPIF? Creating a new data provider should be a last resort, so if the stac specifications are closely aligned with WFS/OAPIF then I think making a separate provider is a mis-step. Similarly, creating QgsStacRasterDataProvider is a red-flag to me. The GDAL library and data provider are already mature and written to handle optimised reading of remote data sources, and it would be a massive undertaking and large technical debt to reimplement all that just for stac resources. Besides, GDAL itself already handles stac resources well, so I fail to see why we'd need a dedicated provider in the first place.
No, this is not a good idea - creating a new layer class just for a single data provider breaks a lot of the assumptions and designs of QGIS API. The better approach is to add the missing bits to the API for the existing layer classes. Was this proposal created in partnership/consultation with Lutra, who did the original stac implementation? |
@02JanDal following https://github.com/qgis/QGIS-Enhancement-Proposals?tab=readme-ov-file#process-and-policies, can you please announce this also to QGIS-psc lists? |
(EDIT: posted from the wrong account, I'm the same as @02JanDal)
Done, that is however currently not documented on the linked page (only QGIS-developers and potentially QGIS-users)?
There is a bit of historical cruft involved here, as STAC was developed along-side OAPIF, rather than based on it. But basically while STAC borrows a lot of ideas from OAPIF (and vice-versa) and a lot of STAC endpoints implement OAPIF, there is no guarantee for this to be the case. Specification wise STAC has split of OAPIF-compability into a separate part: https://github.com/radiantearth/stac-api-spec/tree/v1.0.0/ogcapi-features The So basically we could solve parts of this using the existing OAPIF data provider, but likely not in a way that supports Item Search and it would not be guaranteed to work with all STAC endpoints.
That's a good point. Based on what I can tell the STACIT driver could be sufficient here. That would leave the questions of how to handle the temporal controller (probably would make sense to add support to it to the GDAL data provider/the STACIT driver) and authentication (AFAICT the GDAL driver does not support OAuth2 right now, right?). Regarding authentication; I got an interesting tip based on https://lists.osgeo.org/pipermail/qgis-developer/2025-February/067392.html that PDAL is considering to move to GDAL's VSI. A possible solution to both PDAL algorithms and authentication here could then be to register a custom GDAL VSI (something like
I suspected as much. The other option (I can think of) would then be to let the data provider provide flags on stuff that the user may change, with the default being "everything" for backwards compatibility. For this use-case it'd probably be about leaving Information and Symbology/Labels/Masks/3D View/Diagrams editable. UI wise I think graying out the tabs, making the fields readonly and displaying some sort of "managed by data provider" message would be the best option? Actions and initial styling can be set when adding the layer from the STAC tree, so as long as actions aren't overridable it should be fine (symbology I think should be possible to override, it should just have some sensible default).
I messaged @uclaros specifically when I sent the announcement to QGIS-Developer. I haven't had any further contact, but I hope they'll give their input here. |
I have similar concerns as @nyalldawson (btw. we have not discussed the proposal with Jan before) I think it would be good to step back from the design and clarify the requirements first - the proposal mostly focuses on HOW things should be done, but less so about what are the end goals... My understanding is that you have a STAC server with some collections and you'd like to browse such collections on the map (with the help of temporal controller). I fully agree with Nyall that creation of a new data provider should be the last resort because that requires a lot of new code (and thus maintenance cost). You could probably easily use just a vector layer with memory provider and populate it on the fly. For point clouds, virtual point cloud data provider should already be able to handle nearly everything you want - show bounding boxes, show tile names, show preview point cloud etc, and the For raster data, GDAL already has VRT support and STAC support... maybe an interesting addition could be that raster layers from VRT/STAC driver could be rendered using just bounding boxes rather than real raster data, similarly to how virtual point clouds can be rendered. Maybe adding such raster layer renderer would be enough for your use case? By the way, how would you like to handle pagination when dealing with STAC search results? Some collections can be HUGE - e.g. millions of items - and it is not feasible to fetch everything. |
Probably a bit more context then necessary, but to try to not leave anything out:The national land survey of Sweden, Lantmäteriet, provide a platform via which you can access various datasets from various providers (municipalities mainly) which uses STAC/OAPIF (with COG and COPC for raster and point cloud data sets respectively). Additionally, they now provide their EU High Value Data-datasets via STAC (with either COG, COPC or GeoPackage). We now have a project where we, among a few other things, are to help them improve how consumers can access these resources using QGIS. The project is budget bound rather than task bound, so we'll try to get as much done as possible within the given budget, but the things we've so far identified would be useful:
The last two are the ones discussed here. The first is mostly about "discovery" (users wanting to just get a glimpse of what's available, in order to later be able to work with the data available, sure, you can point them at documentation etc. but at the end of the day that's not where most users will look...) and picking stuff from the map. Additionally, while the data browser dialog based search works alright especially for datasets with useful names (which is the case for most vector data from Lantmäteriet, which is made available as one GeoPackage per municipality), it's not quite the same usability for other types of data (such as orthophotos, which often have a name based on the coordinate or some index). As an example, say I need orthophotos over Stockholm. I'll add the STAC endpoint, open the data dialog, set the filter to current map extent and expected temporal extent (which requires me to know when the latest images where taken over the area in question). Then I need to scroll down and individually add each orthophoto as an individual layer to the map. Now, there are other options here. We could build a processing algorithm which, given an extent, adds either a group layer with individual COGs or a single layer with the STACIT driver. But that would require us to either pull in an external STAC library, implement our own STAC client, or make the The only real solution to that which I've been able to think of (but I'd be happy to be proven wrong) is by adding the STAC endpoint as a layer. Then the user could identify-click to find time periods for orthophotos at a given point, or use the slider in the temporal controller or similar. For background maps, there are fewer options. Again, we could add a processing algorithm that adds a STACIT-backend layer over the whole STAC endpoint, but we'd need some code in either the layer or the data provider which updates the URL based on the filtered temporal extent. The STACIT driver also doesn't have the ability to show extents when further zoomed out. More in general, I think browsing STAC endpoints on the map, integrated with QGIS existing functionality such as the temporal controller, is the most natural way for a user to work with it. It's spatial data after all, so shouldn't it be on the map?
I don't think I quite agree with "a lot of new code". Sure, some, however the data provider needed here doesn't need to be particularly complex. Most of the actual fetching would be delegated to the existing STAC-related classes, and there is no editing or similar more complex functionality involved. Memory provider - sure, it would be a possible solution, however the "populate on the fly" part would get ugly IMO. We'd have to listen to changes in the map extent etc. and pretty much circumvent the entire code path for fetching data on the fly. Meanwhile, data providers/layers have existing path for populating on the fly, as used by pretty much every data provider. The amount of code would most likely be higher and more complex.
Same issue as with memory providers, we'd basically need to change the VPC on the fly using logic circumventing the "usual" way this is handled. Or we could just create the full VPC file upfront, but as you yourself point out further down we'd potentially have to fetch quite a lot of data, most of which would never be used because the user isn't looking at it. But as described in the original proposal the VPC provider is definitely relevant. But rather by re-using functionality (potentially re-factoring a bit to let the two data providers share functionality) than using it straight out (an option would be to adapt the VPC data provider to let itself fetch new items on-the-fly, which would be less code overall but make that specific data provider more complex. But I'm open to that option).
Same issue with VRT as with VPC. As mentioned above the STACIT-driver can likely solve a lot here, but we'd have to find a way to handle authenticated endpoints (such as a custom VSI) and update the STACIT parameters/URL on the fly based on temporal controller and other filters. Happy for any suggestions regarding that.
Same as OAPIF today - continue fetching pages until everything in view is rendered. It's not ideal of course, but as with OAPIF there isn't really a better option (maybe fetching a fixed amount and showing a "there's more data if you zoom in" message to the user I guess). |
Thanks a lot Jan for taking time to provide the extra context - it is now much easier to understand your intentions! I also appreciate that you would like to improve UX of the STAC client in QGIS - there's a lot of untapped potential still, we just need to find the most suitable way how to do things :-) In the next step, it would be good to find an agreement about what new features to offer to the user (ignoring any implementation details), and only after that let's think of how to implement such features. My thoughts about STAC client UX improvements:
My understanding is that on top of the general UX improvements to STAC mentioned above, you would like to improve STAC support for COGs and COPCs in particular - to be able to stream the actual data to map canvas rather than looking just at the bounding boxes - ideally by having a map layer that would simply point to a STAC collection, and it would handle search for items in that collection and their display in map canvas - right? And I guess it is expected such a layer could be also saved in a QGIS project file? For this kind of functionality - in terms of technical approach I would suggest:
|
I think this should be left as a completely separate project. It belongs in GDAL itself, where there's already partial support for replacing the networking stack with a custom one. QGIS does this already, but GDAL doesn't use the custom functions everywhere -- eg its not used in the vsi code currently. If that were done then GDAL requests would just go through QGIS' standard network access manager, and all the authentication/proxy/... would "just work". |
That would be useful in general, though only partially (for one of three catalogs) in our case. Also interesting would be the ability to search in the browser (either inline in the tree to filter a specific collection or for the entire browser). To be a bit more specific, one catalog contains collections (one per data type, such as buildings) containing one item per municipality (290 items). The most common usecase here is for a user to want to get to the item from a specific catalog for a specific municipality. Scrolling in an alphabetically sorted list works, filtering would be even nicer. The other two catalogs are more "traditional" STAC catalogs, they contain collections of orthophotos/DEMs. In them the names of the items are close to useless (even the names of the collections are of questionable use), so neither scrolling nor filtering by name is of much use:
I guess turning on/off -> hiding/showing a layer?
I'm not quite sure I follow here, why would the STAC client have to "take over" the temporal controller? If the map layer of STAC items is a normal layer with a data provider (new or existing) then it'd work just as it usually does?
Correct! Yes, the most natural thing would be for the layer to be saved with the project (otherwise it'd be confusing for the user why that particular layer disappears after a restart/reopening).
I'm a bit concerned that this would significantly complicate the VPC provider (which is completely static today). On the other hand, in the end the VPC provider would likely just have a subset of the functionality of the STAC-pointcloud-provider, so it might make sense to adapt the VPC-provider directly instead.
Could you point me in the direction of where QGIS already does this, my code searching fails me for that it seems... And yes, I agree, this would be a separate project/proposal. Thought some more about this over the weekend - originally my intention was to support any STAC catalog even if they don't support STAC API/search. However, I'm starting to doubt that; on the one hand most catalogs large enough for this all to be worth it do support STAC API, and the technical tradeoffs aren't quite worth it. Additionally, I realized that the data source manager also requires STAC API so having the same limitation here wouldn't be to bad. This would mean that we're a lot closer to the OAPIF data provider already, as STAC API Features (to get items from a single collection) and STAC API Item Search both build on top of OGC API Features. For item extents that would mean allowing the data provider to use a custom endpoint (rather Does this sound like a good way forward? Next would then be the question of how to expose the additional information from STAC, namely the assets. There is already separate parsing of OAPIF-specific properties happening so that would be a reasonable place to also parse assets. But how should that then be "attached" to the feature? Adding it as a separate field/attribute with a custom type? Adding it as a sibling to attributes on the feature (similar to how it's organized on the STAC item)? Given assets on features in general support for that can then be added to the identifier tool etc., which could also be useful for other data providers (can right now only think of ArcGIS Server and their attachments though). An interesting option here would then be to add a a renderer that uses the assets; essentially giving a live preview of the raster/point cloud. As opposed to a STACIT/VPC-based solution that layer couldn't be used in point cloud/raster consuming algorithms. |
This QEP documents a proposed addition to the STAC support - adding STAC endpoints as layers.
There are some points here were I'd appreciate input, especially regarding the contents of the STAC layers subheader. From the PoV of this specific feature it would be beneficial to be able to restrict what the user can change about a layer (instead letting the data source control it). This could technically be solved relatively easily (for example by passing through a "read-only" flag from the data source which the layer properties dialog uses to hide pages), but I can image that this might violate some unspoken architectural rule. Or maybe not?
Provided that this QEP is accepted we'll be to implement at least the first part (adding a layer showing the extents of the STAC items) with the funding we've currently got, hopefully also preview of raster and/or point clouds.