Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bindings to the GDAL/OGR Vector API #241

Open
ctoney opened this issue Mar 1, 2024 · 28 comments
Open

Bindings to the GDAL/OGR Vector API #241

ctoney opened this issue Mar 1, 2024 · 28 comments
Labels
enhancement New feature or request

Comments

@ctoney
Copy link
Collaborator

ctoney commented Mar 1, 2024

This issue is for potential discussion of bindings to the GDAL/OGR Vector API. A draft vector interface is described at:
https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html

A prototype with partial implementation is in the gdalvector branch at:
https://github.com/USDAForestService/gdalraster/tree/gdalvector

Comments and suggestions are appreciated.

@ctoney ctoney added the enhancement New feature or request label Mar 1, 2024
@ctoney ctoney changed the title Enhancement: Bindings to the GDAL/OGR Vector API Bindings to the GDAL/OGR Vector API Mar 1, 2024
@mdsumner
Copy link
Collaborator

mdsumner commented Mar 5, 2024

This already covers a lot what vapour does with vector layers.

mvfile <- system.file(file.path("extdata/tab", file), package="vapour")
x <- new(gdalraster:::GDALVector, mvfile)
x$getDriverLongName() ## vapour_driver
## vapour_layer_info, vapour_report_fields
x$bbox()
x$getFeatureCount()
x$getSpatialRef()
x$getLayerDefn()
x$getName()

## we would want some helpers to avoid materializing geometry and attributes, so we can skip/limit, and get just bbox etc
## vapour_read_geometry(limit_n = 2, skip_n = 4)
## vapour_geom_summary()/vapour_read_extent()
x$getNextFeature(); x$getNextFeature(); 
## now iterate for the next 4 features to match the skip_n/limit_n of vapour

x$getGeometryColumn()  ##vapour_geom_name()

I can't spend more time this week but I feel like this could easily replace the innards of vapour. I'll have a closer look and see if there's anything missing across the raster stuff as well (I'm pretty well established in what and how I use vapour now so it won't take me much to investigate the core pieces I really need). I would look to replacing my dependence on vapour with this, and probably just let vapour sit as-is rather than work more on it (though it might be fun to modify it to dep on this, and I will need to think about the one non-me revdep vapour has {ursa}).

I think ideally we would want to be able to spit out vectors of geos or ogr pointers, or wkb/wkt with support by {wk} (or even the arrow stream from GDAL itself) . There's certainly significant interest in arrow and geoarrow, so it might be easy to get contributions to help streamline this with the wk/geoarrow approach with this as the primary GDAL-API package for R (but I'll have to think about that some more at a later time.)

@ctoney
Copy link
Collaborator Author

ctoney commented Mar 5, 2024

@mdsumner, thank you for the helpful comments. Look forward to working through more of the details.

I added “vectors of geos or ogr pointers, or wkb/wkt with support by {wk}” in the section "Further consideration / TBD".

Added GDALVector::setIgnoredFields() in the class definition (not in prototype yet).

Added arguments fields and geom_column in GDALVector::getFeatureSet() (not in prototype yet).

I had the OGR Arrow interface under "TBD" for GDALVector::getArrowStream() (GDAL >= 3.6) and GDALVector::writeArrowBatch() (GDAL >= 3.8), but I don't have that expertise currently. A contributor would be awesome if that were possible. The basic implementation of getFeatureSet() would populate a data frame by iterating feature rows and appending across the column vectors. The ability to access the data by column in a memory efficient way seems like a major improvement for large datasets.

@goergen95
Copy link

goergen95 commented Mar 10, 2024

This is really cool to experiment with so far!

I am currently trying to clip a source layer by another one but hitting Error: could not find valid method. (this would be a very common use case inside the mapme package).
Below is my code, does anyone have an idea what is going wrong?

library(sf)
library(gdalraster)

nc_org <- system.file("shape/nc.shp", package="sf")
nc_out <- "/vsimem/nc.gpkg"
ogr2ogr(nc_org, nc_out, cl_arg = c("-t_srs", "EPSG:4326", "-nlt", "PROMOTE_TO_MULTI"))

url <- "/vsicurl/https://data.source.coop/vida/google-microsoft-open-buildings/flageobuf/by_country/country_iso=USA/USA.fgb"

buildings_usa <- new(GDALVector, url)
nc <- new(GDALVector, nc_out)

srs <- buildings_usa$getSpatialRef()
dsn_out <- "/vsimem/nc_buildings.gpkg"
gdalraster:::.create_ogr("GPKG", dsn_out, 0, 0, 0, "Unknown", "nc_buildings", srs)
nc_buildings <- new(GDALVector, dsn_out, "nc_buildings", read_only = FALSE)

buildings_usa$layerClip(nc, nc_buildings)

buildings_usa$close()
nc$close()
nc_buildings$close()

@mdsumner
Copy link
Collaborator

the layerClip call is missing two required arguments:

atm you'll have to specify all args explicitly, usually we would do that in the R wrapper with default values for most args

(untested, just I'm accustomed to seeing that error message) 🙏

@goergen95
Copy link

Ah, right! That seems to make it work. Thanks.

@ctoney
Copy link
Collaborator Author

ctoney commented Mar 10, 2024

The layer geoprocessing functions were done as class methods in GDALVector mainly because it mimics the underlying API (class methods of OGRLayer in the C++ API). It was quick and easy to add them in GDALVector with RCPP_EXPOSED_CLASS, for prototype purposes. We might instead make them stand-alone functions in the R interface. That would look something like

layerIntersection(input_layer, method_layer, result_layer, quiet = FALSE, options = NULL)

in which case we need to supply the input_layer too. That works fine since we can pass around objects of class GDALVector which may have an attribute and/or spatial filter already defined. Also, we could optionally create the output layer. So result_layer could be specified as currently a GDALVector object, or alternatively with additional arguments for DSN/layer potentially to be created.

Added a note to that effect in the draft document for TBD.

Also added in the draft document, a link to the header file src/gdalvector.h. That file gives a quick reference to the exposed class methods that have been implemented so far in the prototype (a subset of the draft class definition).

@ctoney
Copy link
Collaborator Author

ctoney commented Mar 11, 2024

@goergen95 , I experimented with your example above. I noticed that the USA buildings .fgb file is pretty large at 36 GB with 145 million features. There are close to 9 million features that intersect the NC bbox. When I try to run the clip for all NC counties it processes for a long time without getting very far. I'm curious if you would expect that to work in reasonable time, i.e., is that a representative data size/processing task?

The FlatGeobuf file spec also says:

If you're accessing a FlatGeobuf file over HTTP, consider using a CDN to minimize latency. In particular, when using the spatial filter to get a subset of features, multiple requests will be made. Often round-trip latency, rather than throughput, is the limiting factor. A caching CDN can be especially helpful here.

So latency could be an issue in this case, or maybe Source Cooperative does the caching by default?

The code below filters the NC layer to one county. It also sets a spatial filter on the buildings layer first. The clip for Ashe county then runs pretty fast:

Code
library(gdalraster)
#> GDAL 3.8.4, released 2024/02/08, PROJ 9.3.1

nc_org <- system.file("shape/nc.shp", package="sf")
nc_out <- "/vsimem/nc.gpkg"
ogr2ogr(nc_org, nc_out, cl_arg = c("-t_srs", "EPSG:4326", "-nlt", "PROMOTE_TO_MULTI"))

url <- "/vsicurl/https://data.source.coop/vida/google-microsoft-open-buildings/flageobuf/by_country/country_iso=USA/USA.fgb"
# size in GB
vsi_stat(url, "size") / 1e9
#> [1] 36.72616

# assuming trusted data for VERIFY_BUFFERS=NO
buildings_usa <- new(GDALVector,
                     dsn = url,
                     layer = "bfp_USA",
                     read_only = TRUE,
                     open_options = "VERIFY_BUFFERS=NO")

buildings_usa$getFeatureCount()
#> [1] 145459485

nc <- new(GDALVector, nc_out)
nc$setAttributeFilter("NAME == 'Ashe'")
nc$getFeatureCount()
#> [1] 1
feat <- nc$getNextFeature()
bb <- bbox_from_wkt(feat$geom)
print(bb)
#> [1] -81.74091  36.23444 -81.23971  36.58973

buildings_usa$setSpatialFilterRect(bb)
buildings_usa$getFeatureCount()
#> [1] 34641

srs <- buildings_usa$getSpatialRef()
dsn_out <- "/vsimem/nc_buildings.gpkg"
gdalraster:::.create_ogr("GPKG", dsn_out, 0, 0, 0, "Unknown", "nc_buildings", srs)
#> [1] TRUE
nc_buildings <- new(GDALVector, dsn_out, "nc_buildings", read_only = FALSE)

buildings_usa$layerClip(nc, nc_buildings, FALSE, NULL)
#> 0...10...20...30...40...50...60...70...80...90...100 - done.

nc_buildings$getFeatureCount()
#> [1] 24746
defn <- nc_buildings$getLayerDefn()
names(defn)
#> [1] "boundary_id"    "bf_source"      "confidence"     "area_in_meters"
#> [5] "geom"

buildings_usa$close()
nc$close()
nc_buildings$close()

Created on 2024-03-11 with reprex v2.1.0

@goergen95
Copy link

Hi @ctoney,

processing this amount of data is not representative of what we are dealing with (though there are some requesters interested in the building footprints data set). The most complex vector data source we are currently working with most probably are the IUCN species ranges. These have to be downloaded locally anyway, so latency is not the limiting factor here. I would not expect gdalraster to handle the building footprint data set while being located on a remote server with ease. If one was really going to analyze it globally, copying the data to a server with lower latency would be the way to go, I guess.

With that in mind, I wanted to say that your vignette on GDAL's block cache is really helpful to optimize read operations on rasters. I wonder if it was worthwhile to work on something along the same lines for vector data? We could consider e.g. number of features, complexity of geometries, GDAL options, and file formats with regard to I/O performance?

@ctoney
Copy link
Collaborator Author

ctoney commented Mar 13, 2024

@goergen95 , thanks for the additional information. I'm still experimenting with the building footprints. It's likely just throughput that's limiting for the ~9 million in NC. It sounds like Source Cooperative provides what the FlatGeobuf spec recommends: https://github.com/flatgeobuf/flatgeobuf#optimizing-remotely-hosted-flatgeobufs. It's good to have a couple of examples like this for testing. I'll take a look at the IUCN species ranges too.

Right, there are several format-specific options and other configuration with regard to I/O performance for vector processing. That's a good idea to start collecting those into one place for reference. I could try to do that and eventually put a draft somewhere that we can add to.

@goergen95
Copy link

Hi,

I am currently experimenting a bit more with the vector API. I hit a wall when I was trying to write a vector data set from scratch. Below code assumes that I obtained a number of bounding boxes somehow (a real use-case might be the footprints from items in a STAC catalog). I seem not to be able to find any methods to write features to a newly created GDALVector dataset. The examples in the vignette seem to operate on copies of datasets. Is it the case that we are currently not able to write features or am I missing something obvious?

library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
library(gdalraster)
#> GDAL 3.8.2, released 2023/16/12, GEOS 3.12.1, PROJ 9.3.1

# create example data via sf
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source 
#>   `/home/darius/R/x86_64-pc-linux-gnu-library/4.3/sf/shape/nc.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
grid <- st_make_grid(nc, n = 10)
bboxs <- lapply(grid, st_bbox)
srs_wkt <- st_crs(nc)$wkt

# to wkt with gdalraster
bboxs_wkt <- sapply(bboxs, bbox_to_wkt)
id <- 1:length(bboxs_wkt)
dsn <- "/vsimem/bbox.gpkg"

# creation is currently un-exported and has the signature of the raster API?
gdalraster:::.create_ogr("GPKG", dsn, 0, 0, 0, "Unknown", "result_layer", srs_wkt)
#> [1] TRUE
lyr <- new(GDALVector, dsn, "result_layer", read_only = FALSE)
lyr$getFeatureCount()
#> [1] 0
# create methods seem to be un-exported, e.g. CreateField
# what is the recommended way to write a layer from scratch?

Created on 2024-04-03 with reprex v2.1.0

@ctoney
Copy link
Collaborator Author

ctoney commented Apr 3, 2024

Hi @goergen95,
That is correct, we are currently not able to write features, at least with the prototype of GDALVector. The class definition in the specification document is intended to reflect proposed/planned functionality. Currently, the only way to know what has actually been implemented in the prototype is to scan through the header file in the gdalvector branch: https://github.com/USDAForestService/gdalraster/blob/gdalvector/src/gdalvector.h. Under class GDALVector, then under the public: specifier at line 33, the signatures for the public constructors are given, and then signatures for the public methods that have been implemented so far. The prototype is basically read-only at this point, except that the OGR geoprocessing methods are available to write new layers (layerIntersection() etc.)

Note that ogrinfo could be used to edit data with SQL statements. The documentation for gdalraster::ogrinfo() has a couple of examples. But in general this would be a little clunky for writing data compared with what we're envisioning for the class-based interface, having methods for (at least) createFeature(), setFeature(), upsertFeature(), and deleteFeature(), potentially used inside startTransaction() / commitTransaction().

Thanks for the nudge though. I expect to refocus on the vector API sometime soon and prioritize that for development over the coming weeks. It's a good idea to get some basic write capability into the prototype sooner than later, so that it can be included in early testing/feedback as you're attempting to do now. I'll post updates here once anything meaningful is added. In the meantime, please do send feedback or suggestions if anything else comes up.

@ctoney
Copy link
Collaborator Author

ctoney commented Apr 4, 2024

@goergen95,
Correction to the above

Currently, the only way to know what has actually been implemented in the prototype is to scan through the header file in the gdalvector branch

It may be more convenient to view that information in R. Calling GDALVector will return the built-in documentation provided by Rcpp. Once an object has been created from a RCPP_EXPOSED_CLASS, str() will return the list of methods that can be called on the object via the $ operator. Example output below:

Code
## usage for GDALVector class

library(gdalraster)
#> GDAL 3.8.4, released 2024/02/08, GEOS 3.12.1, PROJ 9.3.1

# examine class GDALVector
GDALVector
#> C++ class 'GDALVector' <0x58b5a612e9f0>
#> Constructors:
#>     GDALVector()
#>         docstring : Default constructor, only for allocations in std::vector.
#>     GDALVector(Rcpp::CharacterVector)
#>         docstring : Usage: new(GDALVector, dsn)
#>     GDALVector(Rcpp::CharacterVector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
#>         docstring : Usage: new(GDALVector, dsn, layer)
#>     GDALVector(Rcpp::CharacterVector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)
#>         docstring : Usage: new(GDALVector, dsn, layer, read_only=[TRUE|FALSE])
#>     GDALVector(Rcpp::CharacterVector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, Rcpp::CharacterVector)
#>         docstring : Usage: new(GDALVector, dsn, layer, read_only, open_options)
#> 
#> Fields: No public fields exposed by this class
#> 
#> Methods: 
#>      Rcpp::NumericVector bbox()  
#>            docstring : Return the bounding box (xmin, ymin, xmax, ymax).
#>      void clearSpatialFilter()  
#>            docstring : Clear the current spatial filter.
#>      void close()  
#>            docstring : Release the dataset for proper cleanup.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getDriverLongName()  const 
#>            docstring : Return the long name of the format driver.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getDriverShortName()  const 
#>            docstring : Return the short name of the format driver.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getDsn()  const 
#>            docstring : Return the DSN.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getFIDColumn()  const 
#>            docstring : Return name of the underlying db column being used as FID column.
#>      double getFeatureCount()  
#>            docstring : Fetch the feature count in this layer.
#>      Rcpp::CharacterVector getFileList()  const 
#>            docstring : Fetch files forming dataset.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getGeomType()  const 
#>            docstring : Return the layer geometry type.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getGeometryColumn()  const 
#>            docstring : Return name of the underlying db column being used as geom column.
#>      Rcpp::List getLayerDefn()  const 
#>            docstring : Fetch the schema information for this layer.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getName()  const 
#>            docstring : Return the layer name.
#>      SEXP getNextFeature()  
#>            docstring : Fetch the next available feature from this layer.
#>      std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > getSpatialRef()  const 
#>            docstring : Fetch the spatial reference system for this layer as WKT string.
#>      bool isOpen()  const 
#>            docstring : Is the dataset open?
#>      void layerClip(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Clip off areas that are not covered by the method layer.
#>      void layerErase(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Remove areas that are covered by the method layer.
#>      void layerIdentity(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Identify features of this layer with the ones from the method layer.
#>      void layerIntersection(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Intersection of this layer with a method layer.
#>      void layerSymDifference(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Symmetrical difference of this layer and a method layer.
#>      void layerUnion(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Union of this layer with a method layer.
#>      void layerUpdate(GDALVector, GDALVector, bool, Rcpp::Nullable<Rcpp::Vector<16, Rcpp::PreserveStorage> >)  
#>            docstring : Update this layer with features from the method layer.
#>      void open(bool)  
#>            docstring : (Re-)open the dataset on the existing DSN and layer.
#>      void resetReading()  
#>            docstring : Reset feature reading to start on the first feature.
#>      void setAttributeFilter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)  
#>            docstring : Set a new attribute query.
#>      void setSpatialFilterRect(Rcpp::NumericVector)  
#>            docstring : Set a new rectangular spatial filter.
#>      bool testCapability(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)  const 
#>            docstring : Test if this layer supports the named capability.

# MTBS fires in Yellowstone National Park 1984-2022
f <- system.file("extdata/ynp_fires_1984_2022.gpkg", package="gdalraster")
# copy to a temporary in-memory file that is writeable
dsn <- paste0("/vsimem/", basename(f))
vsi_copy_file(f, dsn)

lyr <- new(GDALVector, dsn, "mtbs_perims")

# examine object of class GDALVector
lyr
#> C++ object <0x58b5a60f01b0> of class 'GDALVector' <0x58b5a612e9f0>
str(lyr)
#> Reference class 'Rcpp_GDALVector' [package "gdalraster"] with 0 fields
#>  list()
#>  and 44 methods, of which 30 are  possibly relevant:
#>    bbox, clearSpatialFilter, close, finalize, getDriverLongName,
#>    getDriverShortName, getDsn, getFeatureCount, getFIDColumn, getFileList,
#>    getGeometryColumn, getGeomType, getLayerDefn, getName, getNextFeature,
#>    getSpatialRef, initialize, isOpen, layerClip, layerErase, layerIdentity,
#>    layerIntersection, layerSymDifference, layerUnion, layerUpdate, open,
#>    resetReading, setAttributeFilter, setSpatialFilterRect, testCapability

@ctoney
Copy link
Collaborator Author

ctoney commented Apr 5, 2024

Another minor correction to the above.

Using ogrinfo() to edit data might be a little clunky only in the cases when we want to edit by operating on individual feature records (e.g., accessing the dataset via cursor, consistent with the use case that @goergen95 described above). In those cases, it is likely better to construct a list of field name=value for a feature, including a geom field containing WKT, and pass to an insert/update/upsert method on the layer, with FID of the affected feature as return value. (Of course, that functionality is one of the motivations to develop this interface in the first place, and I hope to have some write capability available for testing in the near future).

We will probably also have an execute_sql() that operates on OGR data sources (wrapper of GDALDatasetExecuteSQL()). We can already create a layer object from a SELECT statement in the constructor for GDALVector, but execute_sql() would be used for ALTER TABLE, CREATE INDEX, INSERT, UPDATE etc. that do not return a result set. This would be redundant with functionality that is now provided by gdalraster::ogrinfo(), but could be more specialized. Also, note that ogrinfo the command-line utility, was exposed via the GDAL utils API somewhat recently at GDAL 3.7, so has that version requirement for using via the wrapper in gdalraster.

The clarification here is mainly that ogrinfo() quite helpfully provides functionality to edit data with SQL statements, and it is not especially clunky for that usage in general. It would even be preferred for some cases (the example in https://usdaforestservice.github.io/gdalraster/reference/ogrinfo.html). It can be used for editing any writeable OGR vector data source, not just RDBMS-based. Also worth noting that we can use both the OGR SQL and SQLite dialects, the latter providing INSERT/UPDATE/DELETE:

https://gdal.org/user/ogr_sql_dialect.html
https://gdal.org/user/sql_sqlite_dialect.html

(apologies to the extent that any of the above may be well-known already, but hopefully worth clarifying)

@ctoney
Copy link
Collaborator Author

ctoney commented May 14, 2024

Utility functions for managing vector data sources have been added in main (#343), to be included in a 1.11 release in the next week or two hopefully. Documented at:
ogr_manage - https://usdaforestservice.github.io/gdalraster/reference/ogr_manage.html
and
ogr_define - https://usdaforestservice.github.io/gdalraster/reference/ogr_define.html

These changes have also been merged into the gdalvector branch, and the examples in the draft document for the vector API have been updated:
https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html

In case you install from the updated gdalvector branch, another notable change affecting dependencies was:

initial int64 support; now linking to RcppInt64, and importing bit64; FID and OFTInteger64 fields are now returned in R as integer64; updated the examples (2024-04-06)

@mdsumner
Copy link
Collaborator

mdsumner commented Jun 1, 2024

I just noticed that the gdalvector branch source and documentation in the draft are not in-sync, i.e. getFeatureSet is listed in the doc but not in the source:

https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html#class-gdalvector

Not unexpected for in-dev work but perhaps there's commits at your end not pushed for the branch? (Also fine of course just wondering, I've been unable to look at this in recent weeks)

@mdsumner
Copy link
Collaborator

mdsumner commented Jun 1, 2024

btw I love how the class is self-documenting with print(GDALVector) - I had no idea!!

@mdsumner
Copy link
Collaborator

mdsumner commented Jun 1, 2024

just a note to think about, sometimes geometry doesn't have a name - in OGRSQL you can use "_ogr_geometry_" but I'm not sure that's really desirable.

tf <- tempfile(fileext = ".fgb")
ogr2ogr(src, tf, cl_arg = c("-f", "FlatGeobuf"))
ds <- new(GDALVector, tf)
ds$resetReading()
names(ds$getNextFeature())
## [1] "FID"          "event_id"     "incid_name"   "incid_type"   "map_id"       "burn_bnd_ac" 
## [7] "burn_bnd_lat" "burn_bnd_lon" "ig_date"      "ig_year"      ""    ##  << geometry name is empty
ds$close()

It just means that if we tibble::as_tibble(ds$getNextFeature()) we get an error "elements must be named" - but as.data.frame is ok it crafts a long name from the WKT (yukky). I don't know exactly what a good answer here would be but there might be some convention in GDAL itself. Also it's a development tool so not having a name also might be fine as a stance.

@ctoney
Copy link
Collaborator Author

ctoney commented Jun 1, 2024

I just updated branch gdalvector from main. I generally try to keep it up to date, and I usually install from there for my local working instance of gdalraster. But it was a little behind after recent activity.

Also, note that the doc states:

The draft class definition below has been partially implemented in:
https://github.com/USDAForestService/gdalraster/blob/gdalvector/src/gdalvector.cpp

The header file can be referenced for the public class methods that have been implemented so far in the prototype (a subset of the draft class definition below):
https://github.com/USDAForestService/gdalraster/blob/gdalvector/src/gdalvector.h

So not everything in the draft spec has been implemented yet.

Status update: I'm working toward a CRAN release of gdalraster 1.11 in the next day or two. After that, I plan to refocus on gdalvector and work toward a release version for it in coming weeks if possible.

@mdsumner
Copy link
Collaborator

mdsumner commented Jun 1, 2024

Excellent, thank you. I'll keep collecting feedback on GDALVector for when you get back to it.

Loving this package it's really excellent!!

@ctoney
Copy link
Collaborator Author

ctoney commented Jul 13, 2024

The document "Draft Bindings to OGR" has been updated for recent changes and ongoing development in branch gdalvector.

Mainly, sub-headings were added under the heading Description of the interface. New specifications were added in the sub-heading "Feature retrieval" (adds GDALVector::fetch(), analog of DBI::dbFetch(), and support for all OGR field types mapped to native R types), and sub-heading "Geometry" (adds support for WKB, and a GDALVector per-object setting returnGeomAs).

The draft class definition has been updated under heading class GDALVector.

Code has been updated under Example: usage for class GDALVector to include the new additions, mainly fetch() and the new setting for geometry return format. Note that the new field returnGeomAs currently defaults to NONE, and can be set back and forth as needed to one of WKT, WKT_ISO, WKB, WKB_ISO or NONE. There are also two other new read/write fields, defaultGeomFldName (applies when the geometry column name in the source layer is empty, like with shapefiles etc.) and wkbByteOrder (LSB the default, or MSB).

@ctoney ctoney linked a pull request Jul 23, 2024 that will close this issue
@ctoney ctoney removed a link to a pull request Jul 23, 2024
@ctoney
Copy link
Collaborator Author

ctoney commented Jul 23, 2024

Branch gdalvector has been merged into main and further development will be done there.

Documentation for class GDALVector is now available in the development version of the package, and online at:
https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html

Note that the class methods for layer geoprocessing are not documented (layerIntersection() etc.). Those are being re-implemented as a stand-alone ogr_proc() interface that will accept objects of class GDALVector as input, and create the output layer if needed.

@mdsumner
Copy link
Collaborator

awesome!

@ctoney
Copy link
Collaborator Author

ctoney commented Jul 31, 2024

ogr_proc() was added in #451, with online documentation at:
https://usdaforestservice.github.io/gdalraster/reference/ogr_proc.html

Feedback is welcome if we need improvements to the interface or other changes.

@mdsumner
Copy link
Collaborator

I'm wondering about a helper for working with OGRFeature, because as.data.frame(OGRFeature) is no good, and as_tibble() will replicate the attributes for every raw element.

something like

ds <- new(GDALVector, system.file("extdata/ynp_fires_1984_2022.gpkg", package = "gdalraster"))
as_df <- function(x) {
  wgeom <- attr(x, "gis")$geom_column
  x <- unclass(x)
  if (nzchar(wgeom)) {
    geomwrapper <- switch(attr(x, "gis")$geom_format, 
                        BBOX = function(.x) wk::rct(.x[1L], .x[2L], .x[3L], .x[4L]),
                        WKB = function(.x) wk::wkb(list(.x)), 
                        WKT = function(.x) wk::wkt(list(.x)))
    x[[wgeom]] <- geomwrapper(x[[wgeom]])
  }
  as.data.frame(x)
}

for (typ in c("NONE", "BBOX", "WKB", "WKT")) {
  ds$returnGeomAs <- typ
  cat(sprintf("%s: \n", typ))
  print(as_df(ds$getNextFeature())$geom)
  cat("---\n")
}

But, can attr(,"gis")$geom_column every have more than one entry?

If we have this or similar, it could also help with similar handling code in print and plot. (Happy to craft a PR just I'm in the middle of something else)

@mdsumner
Copy link
Collaborator

potentially augment the return value like this:

structure(as.data.frame(x), geom_column = wgeom)

@mdsumner
Copy link
Collaborator

(I realized there's probably a bit of overlap with fetch(), which I'll explore in relation to this).

@ctoney
Copy link
Collaborator Author

ctoney commented Oct 17, 2024

Would the return of ds$fetch(1) in place of ds$getNextFeature() change anything here?

Currently, the return value of ds$getNextFeature() is basically ds$fetch(1) from the current cursor position, but with list column(s) "unlisted", only so they can be accessed without nested indexing, i.e., access a WKB geom as the raw vector at feat$geom rather than feat$geom[[1]] (there will only be one element always). But that might have downsides and is open to change.

I see the fetch() interface as the main access mechanism for features. It is a direct analog of DBI::dbFetch() in terms of behavior w.r.t the n argument and spec of return value. The OGRFeature-as-list returned by getNextFeature() could be augmented/redefined a bit if needed. I've been wondering about that too. The fetch() implementation came after getNextFeature() which was originally done just to be a direct API wrapper.

@mdsumner
Copy link
Collaborator

Ah, yes, that makes sense 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants