Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soilDB::mukey.wcs() fails when there is no mukey data in gSSURGO #304

Closed
kevinwolz opened this issue Sep 30, 2023 · 10 comments · Fixed by #306
Closed

soilDB::mukey.wcs() fails when there is no mukey data in gSSURGO #304

kevinwolz opened this issue Sep 30, 2023 · 10 comments · Fixed by #306

Comments

@kevinwolz
Copy link

kevinwolz commented Sep 30, 2023

Attached here is an example AOI, that does not contain any SSURGO data. Querying for mukey data from gSSURGO, results in an error, because there are no mukeys returned:

aoi <- sf::st_read("aoi_FL_keys.kml")
soilDB::mukey.wcs(aoi = aoi, db = "gssurgo", res = 30, quiet = FALSE)
Error in `[.data.frame`(value, , 1) : undefined columns selected

The error occurs because on the line: levels(r)[[1]] <- rat because the rat is a data frame with 0 columns and 0 rows.

It would be much better if, in this case, soilDB::mukey.wcs() still output the resulting raster, but with all cell values set to whatever value is typical when there is no mukey in a given cell.

aoi_FL_keys.kml.zip
FYI, the AOI includes the Marquesas Keys in Monroe County, FL (the western-most tip of the Florida Keys). This region comes up in my analysis because it is technically within a US county (based on the tigris county outlines).

@brownag
Copy link
Member

brownag commented Sep 30, 2023

Thanks @kevinwolz for bringing this up!

I think I have fixed the issue with the empty RAT in a PR #306. Now a raster with all NaN values ("NoData") is returned. NoData in this case indicates no soil survey area, whereas NOTCOM is an unmapped survey area, which has a NOTCOM mukey and mapunit associated with it.

Specifically for data for the Marquesas Keys--it appears this area is not associated with any soil survey area, and the AOI you are using partially falls outside CONUS gSSURGO boundary in y dimension. FL687 includes up to Key West. I note also there is a (currently NOTCOM) Dry Tortugas SSA FL687 even further to the west.
image


This raises another issue with AOIs at the "edge of the map." We have encountered this before, especially with the raster soil surveys. Currently AOIs that are too large for the source data extent generate warnings because the result from the web service does not match the expected extent of the AOI input. This affects the resolution which could prove to be an issue for user expecting a consistent grid from a set of queries.

Here is an example for a slightly larger AOI that contains some data:

library(soilDB)
library(sf)
setwd("~/workspace/gh/soilDB/304")
aoi <- sf::st_read("/vsizip/aoi_FL_keys.kml.zip")
#> Reading layer `aoi_FL_keys' from data source `/vsizip/aoi_FL_keys.kml.zip' using driver `KML'
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -82.32659 ymin: 24.47644 xmax: -81.97248 ymax: 24.75134
#> Geodetic CRS:  WGS 84

res0 <- soilDB::mukey.wcs(aoi = aoi, db = "gssurgo", res = 30, quiet = FALSE)
#> Warning: expected 1167 rows, received 893; Y resolution may be affected. Try
#> requesting a smaller extent.
#> Warning: [] no labels

res <- soilDB::mukey.wcs(aoi = st_buffer(aoi, 45000), db = "gssurgo", res = 30, quiet = FALSE)
#> Warning: expected 4568 rows, received 2599; Y resolution may be affected. Try
#> requesting a smaller extent.
# some data, not full extent
res
#> class       : SpatRaster 
#> dimensions  : 2599, 4796, 1  (nrow, ncol, nlyr)
#> resolution  : 30, 30.00172  (x, y)
#> extent      : 1344822, 1488702, 270015, 347989.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : NAD83 / Conus Albers (EPSG:5070) 
#> source(s)   : memory
#> categories  : filed0cd28bfe084 
#> name        :   mukey 
#> min value   :  797593 
#> max value   : 3116266

terra::plot(res)
terra::plot(terra::project(terra::vect(aoi), res), add = TRUE)

Right now, other than creating the categories, there is little processing of the grid result after it is downloaded... it could be NaN-padded/terra::extend()-ed to match input dimensions. At a minimum the set resolution should be fixed for the portion of the raster that returns, and probably the warnings about not getting data for full extent should be retained. I will revisit the logic on grid resolution

brownag added a commit that referenced this issue Sep 30, 2023
@kevinwolz
Copy link
Author

Thanks for the quick fix. Yes, I have definitely run into issues as a result of the non-integer resolution that does not match the input resolution. To clarify, did you address that as well?

It looks like you added a fix for this but are still warning the user about the different number of rows received?

@brownag
Copy link
Member

brownag commented Oct 1, 2023

Thanks for the quick fix. Yes, I have definitely run into issues as a result of the non-integer resolution that does not match the input resolution. To clarify, did you address that as well?

Yes, I think resolution is addressed now in 3cc97a9 which we can probably merge soon after a little testing

The spatial extent of the WCS result in edge cases is not what it should be given the number of rows and columns returned, so there appears to be a slight offset. It seems that (at least with certain input geometry) you will get a fraction of a pixel cut off the extent. I am not sure if that is something we can fix on the server side (@dylanbeaudette), but I believe it has now been corrected for the final result in R by resetting the extent of the SpatRaster.

It looks like you added a fix for this but are still warning the user about the different number of rows received?

Yeah, in the PR it is still a warning. Perhaps it could be downgraded to a message?

Requests that are entirely outside the source grid return "bad WCS request". It might be nice if we could provide a more informative error in these cases.

Requests that are partially overlapping the source grid now return only the portion that overlaps (with corrected resolution). I feel the need for the message is b/c the user requested a larger area, so may expect to receive a raster matching input extent as well as resolution. It would be possible to terra::extend() to the input geometry when a mismatch between expected/actual is detected.


EDIT: in #306 I have added the automatic extension to the input AOI extent. I think this is safe to do because the queries are already limited to a grid with dimension 5000 by 5000. This allows the user to make any request that partially overlaps and get a complete result with no modification of server side.

Consider an area to the northern boundary of CONUS, including a portion of Lake of the Woods County, Minnesota. Below I show results are identical to make a request that goes out of bounds, and compare with a fully in-bounds request that has been extended to the first request extent.

library(soilDB)
library(terra)

aoi <- vect(ext(c(-96.1054, -94.7109, 49.3305, 52.5623)), crs = "OGC:CRS84")
res <- mukey.wcs(aoi, res = 100)
#> Request partially outside boundary of coverage source data: expected 3522 rows, received 2481
plot(res)

aoi <- vect(ext(c(-96.1054, -94.7109, 49.3305, 50.5623)), crs = "OGC:CRS84")
res2 <- mukey.wcs(aoi, res = 100)
plot(res2)

#> > unique(res == extend(res2, res), na.rm = FALSE)
#>  mukey
#> 1     1
#> 2   NaN

@kevinwolz
Copy link
Author

@brownag I think the use of terra::ext and terra::expand is a great way to deal with these issues. I agree that returning something that is the same shape and resolution as what the user supplied is the best way to do it. The user can always trim thing later if they want.

My only question/caution is around speed implications. Are the calls to terra::ext and terra::expand going to unnecessarily slow down mukey.wcs when we are not operating in these edge cases? Would it make sense to put these calls behind an if statement to only call them if needed? Or perhaps the terra functions already have this built in to where they only "act" if necessary?

@brownag
Copy link
Member

brownag commented Oct 2, 2023

My only question/caution is around speed implications. Are the calls to terra::ext and terra::expand going to unnecessarily slow down mukey.wcs when we are not operating in these edge cases? Would it make sense to put these calls behind an if statement to only call them if needed? Or perhaps the terra functions already have this built in to where they only "act" if necessary?

@kevinwolz I only adjust extent with ext() and call extend() conditionally--so only when the message is issued that the WCS result does not match what is calculated from the input geometry.

Setting a new extent/resolution with ext() is nearly instantaneous, and extend() runs quickly even in the most extreme cases because the input geometry is limited to a grid that is 5000x5000, so you will never be creating a result that is larger than that (in whatever grid resolution you request).

@kevinwolz
Copy link
Author

Great! Once you merge, I'll be ready to install and test the hell out of it with every edge case I can find.

@dylanbeaudette
Copy link
Member

I'll start a new issue to test the changes against local calls to crop() on a gSSURGO mukey grid. It would be nice to know that the WCS and tinkering with the result don't deviate too much from the source data.
#307

@kevinwolz
Copy link
Author

Hi all, is this ready for me to test/implement?

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Oct 15, 2023

Sure. However, gNATSGO / gSSURGO grids are still last year's data--FY23.

@brownag
Copy link
Member

brownag commented Oct 27, 2023

Hi all, is this ready for me to test/implement?

@kevinwolz PR #306 has been merged and should resolve these problems with the marginal areas of the WCS grids. Sorry for the delay. Let us know when you find a way to break it :)

In the future if you would like to weigh in on in-progress stuff before it is merged into the main branch, you can install directly from a branch or commit e.g. remotes::install_github("ncss-tech/soilDB@fix304") where "@<refname>" refers to the branch name, commit hash, tag, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants