-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bindings to the GDAL/OGR Vector API #241
Comments
This already covers a lot what vapour does with vector layers. mvfile <- system.file(file.path("extdata/tab", file), package="vapour")
x <- new(gdalraster:::GDALVector, mvfile)
x$getDriverLongName() ## vapour_driver
## vapour_layer_info, vapour_report_fields
x$bbox()
x$getFeatureCount()
x$getSpatialRef()
x$getLayerDefn()
x$getName()
## we would want some helpers to avoid materializing geometry and attributes, so we can skip/limit, and get just bbox etc
## vapour_read_geometry(limit_n = 2, skip_n = 4)
## vapour_geom_summary()/vapour_read_extent()
x$getNextFeature(); x$getNextFeature();
## now iterate for the next 4 features to match the skip_n/limit_n of vapour
x$getGeometryColumn() ##vapour_geom_name()
I can't spend more time this week but I feel like this could easily replace the innards of vapour. I'll have a closer look and see if there's anything missing across the raster stuff as well (I'm pretty well established in what and how I use vapour now so it won't take me much to investigate the core pieces I really need). I would look to replacing my dependence on vapour with this, and probably just let vapour sit as-is rather than work more on it (though it might be fun to modify it to dep on this, and I will need to think about the one non-me revdep vapour has {ursa}). I think ideally we would want to be able to spit out vectors of geos or ogr pointers, or wkb/wkt with support by {wk} (or even the arrow stream from GDAL itself) . There's certainly significant interest in arrow and geoarrow, so it might be easy to get contributions to help streamline this with the wk/geoarrow approach with this as the primary GDAL-API package for R (but I'll have to think about that some more at a later time.) |
@mdsumner, thank you for the helpful comments. Look forward to working through more of the details. I added “vectors of geos or ogr pointers, or wkb/wkt with support by {wk}” in the section "Further consideration / TBD". Added Added arguments I had the OGR Arrow interface under "TBD" for |
This is really cool to experiment with so far! I am currently trying to clip a source layer by another one but hitting library(sf)
library(gdalraster)
nc_org <- system.file("shape/nc.shp", package="sf")
nc_out <- "/vsimem/nc.gpkg"
ogr2ogr(nc_org, nc_out, cl_arg = c("-t_srs", "EPSG:4326", "-nlt", "PROMOTE_TO_MULTI"))
url <- "/vsicurl/https://data.source.coop/vida/google-microsoft-open-buildings/flageobuf/by_country/country_iso=USA/USA.fgb"
buildings_usa <- new(GDALVector, url)
nc <- new(GDALVector, nc_out)
srs <- buildings_usa$getSpatialRef()
dsn_out <- "/vsimem/nc_buildings.gpkg"
gdalraster:::.create_ogr("GPKG", dsn_out, 0, 0, 0, "Unknown", "nc_buildings", srs)
nc_buildings <- new(GDALVector, dsn_out, "nc_buildings", read_only = FALSE)
buildings_usa$layerClip(nc, nc_buildings)
buildings_usa$close()
nc$close()
nc_buildings$close()
|
the layerClip call is missing two required arguments:
atm you'll have to specify all args explicitly, usually we would do that in the R wrapper with default values for most args (untested, just I'm accustomed to seeing that error message) 🙏 |
Ah, right! That seems to make it work. Thanks. |
The layer geoprocessing functions were done as class methods in
in which case we need to supply the Added a note to that effect in the draft document for TBD. Also added in the draft document, a link to the header file |
@goergen95 , I experimented with your example above. I noticed that the USA buildings .fgb file is pretty large at 36 GB with 145 million features. There are close to 9 million features that intersect the NC bbox. When I try to run the clip for all NC counties it processes for a long time without getting very far. I'm curious if you would expect that to work in reasonable time, i.e., is that a representative data size/processing task? The FlatGeobuf file spec also says:
So latency could be an issue in this case, or maybe Source Cooperative does the caching by default? The code below filters the NC layer to one county. It also sets a spatial filter on the buildings layer first. The clip for Ashe county then runs pretty fast: Codelibrary(gdalraster)
#> GDAL 3.8.4, released 2024/02/08, PROJ 9.3.1
nc_org <- system.file("shape/nc.shp", package="sf")
nc_out <- "/vsimem/nc.gpkg"
ogr2ogr(nc_org, nc_out, cl_arg = c("-t_srs", "EPSG:4326", "-nlt", "PROMOTE_TO_MULTI"))
url <- "/vsicurl/https://data.source.coop/vida/google-microsoft-open-buildings/flageobuf/by_country/country_iso=USA/USA.fgb"
# size in GB
vsi_stat(url, "size") / 1e9
#> [1] 36.72616
# assuming trusted data for VERIFY_BUFFERS=NO
buildings_usa <- new(GDALVector,
dsn = url,
layer = "bfp_USA",
read_only = TRUE,
open_options = "VERIFY_BUFFERS=NO")
buildings_usa$getFeatureCount()
#> [1] 145459485
nc <- new(GDALVector, nc_out)
nc$setAttributeFilter("NAME == 'Ashe'")
nc$getFeatureCount()
#> [1] 1
feat <- nc$getNextFeature()
bb <- bbox_from_wkt(feat$geom)
print(bb)
#> [1] -81.74091 36.23444 -81.23971 36.58973
buildings_usa$setSpatialFilterRect(bb)
buildings_usa$getFeatureCount()
#> [1] 34641
srs <- buildings_usa$getSpatialRef()
dsn_out <- "/vsimem/nc_buildings.gpkg"
gdalraster:::.create_ogr("GPKG", dsn_out, 0, 0, 0, "Unknown", "nc_buildings", srs)
#> [1] TRUE
nc_buildings <- new(GDALVector, dsn_out, "nc_buildings", read_only = FALSE)
buildings_usa$layerClip(nc, nc_buildings, FALSE, NULL)
#> 0...10...20...30...40...50...60...70...80...90...100 - done.
nc_buildings$getFeatureCount()
#> [1] 24746
defn <- nc_buildings$getLayerDefn()
names(defn)
#> [1] "boundary_id" "bf_source" "confidence" "area_in_meters"
#> [5] "geom"
buildings_usa$close()
nc$close()
nc_buildings$close() Created on 2024-03-11 with reprex v2.1.0 |
Hi @ctoney, processing this amount of data is not representative of what we are dealing with (though there are some requesters interested in the building footprints data set). The most complex vector data source we are currently working with most probably are the IUCN species ranges. These have to be downloaded locally anyway, so latency is not the limiting factor here. I would not expect With that in mind, I wanted to say that your vignette on GDAL's block cache is really helpful to optimize read operations on rasters. I wonder if it was worthwhile to work on something along the same lines for vector data? We could consider e.g. number of features, complexity of geometries, GDAL options, and file formats with regard to I/O performance? |
@goergen95 , thanks for the additional information. I'm still experimenting with the building footprints. It's likely just throughput that's limiting for the ~9 million in NC. It sounds like Source Cooperative provides what the FlatGeobuf spec recommends: https://github.com/flatgeobuf/flatgeobuf#optimizing-remotely-hosted-flatgeobufs. It's good to have a couple of examples like this for testing. I'll take a look at the IUCN species ranges too. Right, there are several format-specific options and other configuration with regard to I/O performance for vector processing. That's a good idea to start collecting those into one place for reference. I could try to do that and eventually put a draft somewhere that we can add to. |
Hi, I am currently experimenting a bit more with the vector API. I hit a wall when I was trying to write a vector data set from scratch. Below code assumes that I obtained a number of bounding boxes somehow (a real use-case might be the footprints from items in a STAC catalog). I seem not to be able to find any methods to write features to a newly created library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
library(gdalraster)
#> GDAL 3.8.2, released 2023/16/12, GEOS 3.12.1, PROJ 9.3.1
# create example data via sf
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source
#> `/home/darius/R/x86_64-pc-linux-gnu-library/4.3/sf/shape/nc.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
grid <- st_make_grid(nc, n = 10)
bboxs <- lapply(grid, st_bbox)
srs_wkt <- st_crs(nc)$wkt
# to wkt with gdalraster
bboxs_wkt <- sapply(bboxs, bbox_to_wkt)
id <- 1:length(bboxs_wkt)
dsn <- "/vsimem/bbox.gpkg"
# creation is currently un-exported and has the signature of the raster API?
gdalraster:::.create_ogr("GPKG", dsn, 0, 0, 0, "Unknown", "result_layer", srs_wkt)
#> [1] TRUE
lyr <- new(GDALVector, dsn, "result_layer", read_only = FALSE)
lyr$getFeatureCount()
#> [1] 0
# create methods seem to be un-exported, e.g. CreateField
# what is the recommended way to write a layer from scratch? Created on 2024-04-03 with reprex v2.1.0 |
Hi @goergen95, Note that Thanks for the nudge though. I expect to refocus on the vector API sometime soon and prioritize that for development over the coming weeks. It's a good idea to get some basic write capability into the prototype sooner than later, so that it can be included in early testing/feedback as you're attempting to do now. I'll post updates here once anything meaningful is added. In the meantime, please do send feedback or suggestions if anything else comes up. |
@goergen95,
It may be more convenient to view that information in R. Calling Code## usage for GDALVector class
library(gdalraster)
#> GDAL 3.8.4, released 2024/02/08, GEOS 3.12.1, PROJ 9.3.1
# examine class GDALVector
GDALVector
#> C++ class 'GDALVector' <0x58b5a612e9f0>
#> Constructors:
#> GDALVector()
#> docstring : Default constructor, only for allocations in std::vector.
#> GDALVector(Rcpp::CharacterVector)
#> docstring : Usage: new(GDALVector, dsn)
#> GDALVector(Rcpp::CharacterVector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
#> docstring : Usage: new(GDALVector, dsn, layer)
#> GDALVector(Rcpp::CharacterVector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
#> docstring : Usage: new(GDALVector, dsn, layer, read_only=[TRUE|FALSE])
#> GDALVector(Rcpp::CharacterVector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, Rcpp::CharacterVector)
#> docstring : Usage: new(GDALVector, dsn, layer, read_only, open_options)
#>
#> Fields: No public fields exposed by this class
#>
#> Methods:
#> Rcpp::NumericVector bbox()
#> docstring : Return the bounding box (xmin, ymin, xmax, ymax).
#> void clearSpatialFilter()
#> docstring : Clear the current spatial filter.
#> void close()
#> docstring : Release the dataset for proper cleanup.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getDriverLongName() const
#> docstring : Return the long name of the format driver.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getDriverShortName() const
#> docstring : Return the short name of the format driver.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getDsn() const
#> docstring : Return the DSN.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getFIDColumn() const
#> docstring : Return name of the underlying db column being used as FID column.
#> double getFeatureCount()
#> docstring : Fetch the feature count in this layer.
#> Rcpp::CharacterVector getFileList() const
#> docstring : Fetch files forming dataset.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getGeomType() const
#> docstring : Return the layer geometry type.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getGeometryColumn() const
#> docstring : Return name of the underlying db column being used as geom column.
#> Rcpp::List getLayerDefn() const
#> docstring : Fetch the schema information for this layer.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getName() const
#> docstring : Return the layer name.
#> SEXP getNextFeature()
#> docstring : Fetch the next available feature from this layer.
#> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getSpatialRef() const
#> docstring : Fetch the spatial reference system for this layer as WKT string.
#> bool isOpen() const
#> docstring : Is the dataset open?
#> void layerClip(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Clip off areas that are not covered by the method layer.
#> void layerErase(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Remove areas that are covered by the method layer.
#> void layerIdentity(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Identify features of this layer with the ones from the method layer.
#> void layerIntersection(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Intersection of this layer with a method layer.
#> void layerSymDifference(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Symmetrical difference of this layer and a method layer.
#> void layerUnion(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Union of this layer with a method layer.
#> void layerUpdate(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)
#> docstring : Update this layer with features from the method layer.
#> void open(bool)
#> docstring : (Re-)open the dataset on the existing DSN and layer.
#> void resetReading()
#> docstring : Reset feature reading to start on the first feature.
#> void setAttributeFilter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
#> docstring : Set a new attribute query.
#> void setSpatialFilterRect(Rcpp::NumericVector)
#> docstring : Set a new rectangular spatial filter.
#> bool testCapability(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const
#> docstring : Test if this layer supports the named capability.
# MTBS fires in Yellowstone National Park 1984-2022
f <- system.file("extdata/ynp_fires_1984_2022.gpkg", package="gdalraster")
# copy to a temporary in-memory file that is writeable
dsn <- paste0("/vsimem/", basename(f))
vsi_copy_file(f, dsn)
lyr <- new(GDALVector, dsn, "mtbs_perims")
# examine object of class GDALVector
lyr
#> C++ object <0x58b5a60f01b0> of class 'GDALVector' <0x58b5a612e9f0>
str(lyr)
#> Reference class 'Rcpp_GDALVector' [package "gdalraster"] with 0 fields
#> list()
#> and 44 methods, of which 30 are possibly relevant:
#> bbox, clearSpatialFilter, close, finalize, getDriverLongName,
#> getDriverShortName, getDsn, getFeatureCount, getFIDColumn, getFileList,
#> getGeometryColumn, getGeomType, getLayerDefn, getName, getNextFeature,
#> getSpatialRef, initialize, isOpen, layerClip, layerErase, layerIdentity,
#> layerIntersection, layerSymDifference, layerUnion, layerUpdate, open,
#> resetReading, setAttributeFilter, setSpatialFilterRect, testCapability |
Another minor correction to the above. Using We will probably also have an The clarification here is mainly that https://gdal.org/user/ogr_sql_dialect.html (apologies to the extent that any of the above may be well-known already, but hopefully worth clarifying) |
Utility functions for managing vector data sources have been added in main (#343), to be included in a 1.11 release in the next week or two hopefully. Documented at: These changes have also been merged into the gdalvector branch, and the examples in the draft document for the vector API have been updated: In case you install from the updated gdalvector branch, another notable change affecting dependencies was:
|
I just noticed that the gdalvector branch source and documentation in the draft are not in-sync, i.e. https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html#class-gdalvector Not unexpected for in-dev work but perhaps there's commits at your end not pushed for the branch? (Also fine of course just wondering, I've been unable to look at this in recent weeks) |
btw I love how the class is self-documenting with |
just a note to think about, sometimes geometry doesn't have a name - in OGRSQL you can use "_ogr_geometry_" but I'm not sure that's really desirable. tf <- tempfile(fileext = ".fgb")
ogr2ogr(src, tf, cl_arg = c("-f", "FlatGeobuf"))
ds <- new(GDALVector, tf)
ds$resetReading()
names(ds$getNextFeature())
## [1] "FID" "event_id" "incid_name" "incid_type" "map_id" "burn_bnd_ac"
## [7] "burn_bnd_lat" "burn_bnd_lon" "ig_date" "ig_year" "" ## << geometry name is empty
ds$close()
It just means that if we |
I just updated branch gdalvector from main. I generally try to keep it up to date, and I usually install from there for my local working instance of gdalraster. But it was a little behind after recent activity. Also, note that the doc states:
So not everything in the draft spec has been implemented yet. Status update: I'm working toward a CRAN release of gdalraster 1.11 in the next day or two. After that, I plan to refocus on gdalvector and work toward a release version for it in coming weeks if possible. |
Excellent, thank you. I'll keep collecting feedback on GDALVector for when you get back to it. Loving this package it's really excellent!! |
The document "Draft Bindings to OGR" has been updated for recent changes and ongoing development in branch gdalvector. Mainly, sub-headings were added under the heading Description of the interface. New specifications were added in the sub-heading "Feature retrieval" (adds The draft class definition has been updated under heading class GDALVector. Code has been updated under Example: usage for class GDALVector to include the new additions, mainly |
Branch Documentation for class Note that the class methods for layer geoprocessing are not documented ( |
awesome! |
Feedback is welcome if we need improvements to the interface or other changes. |
I'm wondering about a helper for working with OGRFeature, because something like
But, can If we have this or similar, it could also help with similar handling code in print and plot. (Happy to craft a PR just I'm in the middle of something else) |
potentially augment the return value like this: structure(as.data.frame(x), geom_column = wgeom) |
(I realized there's probably a bit of overlap with fetch(), which I'll explore in relation to this). |
Would the return of Currently, the return value of I see the |
Ah, yes, that makes sense 🙏 |
This issue is for potential discussion of bindings to the GDAL/OGR Vector API. A draft vector interface is described at:
https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html
A prototype with partial implementation is in the
gdalvector
branch at:https://github.com/USDAForestService/gdalraster/tree/gdalvector
Comments and suggestions are appreciated.
The text was updated successfully, but these errors were encountered: