Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetchSOLUS() #362

Merged
merged 20 commits into from
Oct 23, 2024
Merged

Add fetchSOLUS() #362

merged 20 commits into from
Oct 23, 2024

Conversation

brownag
Copy link
Member

@brownag brownag commented Oct 8, 2024

This tool creates a virtual raster from Cloud Optimized GeoTIFFs (COGs) from the Soil Landscapes of the United States 100-meter (SOLUS100) soil property maps project repository.

Remaining tasks:

  • Consider implementing grid=FALSE argument for point queries/SoilProfileCollection result output
    • In order to create SPC output need to consider point vs area depth support, and implement various options for change of support via piecewise/stepwise, linear interpolation, mass preserving splines / aqp::spc2mpspline() etc.
  • Check edge cases and error handling
  • Think about any other standardization or conventions for querying; pass any suggestions on to Travis
  • Add tests and possibly skip them based on R_SOILDB_SKIP_LONG_EXAMPLES envvar
  • Apply scaling factors

Example

library(soilDB)
b <- c(-119.747629, -119.67935, 36.912019, 36.944987)

bbox.sp <- sf::st_as_sf(wk::rct(
    xmin = b[1], xmax = b[2], ymin = b[3], ymax = b[4],
    crs = sf::st_crs(4326)
))

ssurgo.geom <- soilDB::SDA_spatialQuery(
    bbox.sp,
    what = 'mupolygon',
    db = 'SSURGO',
    geomIntersection = TRUE
)

res <- fetchSOLUS(
    ssurgo.geom,
    depth_slices = "0",
    variables = c("sandtotal", "silttotal", "claytotal")
)

terra::plot(res)

image

@dylanbeaudette
Copy link
Member

Cool thanks. I'll have to check, but I'm pretty sure that SOLUS predictions represent discrete depth slices such as 15cm. There was much debate about this over the years.

@brownag
Copy link
Member Author

brownag commented Oct 8, 2024

I'm pretty sure that SOLUS predictions represent discrete depth slices such as 15cm.

Yes, I think you are right. I went a little far here in applying the Global Soil Map depths to the depth interval parameter, but I am forshadowing the alternate interpolation options I plan to offer for SPC output. At any rate, I think it is reasonable to assume a stepwise function in the simplest case, this is what I recall the discussion about splines vs area vs point predictions being. But we can certainly provide alternate options for different types of depth support, starting from the official distributed grids.

 - filetype is ambiguous; may be used in future for terra/GDAL output driver name
@brownag
Copy link
Member Author

brownag commented Oct 8, 2024

Implemented scaling factors in 305e7dd using the values referenced in the table https://storage.googleapis.com/solus100pub/index.html.

However, I think it would probably be more user-friendly to have the source files use GeoTIFF scale parameter so that the user does not need to do these transformations themselves. Currently applying scaling factor on the R side will trigger download of the data to memory/temp. file. Ideally scaling could be handled by GDAL and we could return reference to the full CONUS GeoTIFFs that also would have the scaling properly applied internally.

@brownag
Copy link
Member Author

brownag commented Oct 10, 2024

c42c4d2 implements the interpolation methods via approxfun() and splinefun() for SoilProfileCollection output (grid=FALSE). Default interpolation is method="linear" at 1cm intervals following @smroecker suggestions for wrangling SOLUS for GSP-GSN (i..e. https://github.com/ncss-tech/gsp-gsn/blob/main/solus_transform.R), which should be adequate for doing subsequent interval depth-weighted averages. I still want to provide the option to do mass-preserving splines via {mpspline2} but have not built that in yet.

The default behavior with grid=FALSE will return a profile for every grid cell, unless x is a SpatVector containing point geometries. So, 1cm interpolation (all methods other than method="step") will return a large SPC with many layers per profile even for rather modest extents. Use the samples= argument to specify a number of regular samples to extract from an extent in lieu of specific points or all cells. This is mostly for testing for now, I may provide some different options in this sampling interface such as stratified sampling by input polygons in x.

Also, proper support for "all_cm" depth variables (i.e. site level variables such as "anylithicdept" and "resdept") has been incorporated in c147893. With grid=TRUE you get the gridded values of restriction depth, as before, and with grid=FALSE the restriction depths are included in the site() slot of the result SoilProfileCollection. The depth_slice="all" is now automatically included in user input filter on SOLUS catalog, so one need not specify it explicitly when requesting these site variables.


Here is an example returning horizon and site level properties in an SPC. Result contains 10 profiles regularly sampled from the rectangular extent around some SSURGO polygons. The plot shows each profile truncated to the restriction depth.

An apparent issue with the cutoff of predictions at restriction depth on the SOLUS grid side means that with the literal interpretation of the depth slices as points on a continuous curve we often cant interpolate values all the way to the restriction. We need at least one value beyond the restriction depth to be able to interpolate to that depth, but the corresponding property values are commonly (but not always?) NA. I feel like either I have made a mistake here, or the filtering/NA-masking for SOLUS of predictions near bottom depth was a bit heavy-handed.

For now to get "complete" data all the way to the bottom (note the NA values for claytotal_p in bottom-most layer) of all profiles you need to specify method="step" or method="constant" which assume each value continues as a constant until next value. I note that this may not be the intended interpretation of SOLUS predictions, but in my opinion is the "simplest" interpretation of values that is most consistent with Global Soil Map standards out of the box.

library(aqp)
#> This is aqp 2.0.4
library(soilDB)

b <- c(-119.747629, -119.67935, 36.912019, 36.944987)

bbox.sp <- sf::st_as_sf(wk::rct(
  xmin = b[1], xmax = b[2], ymin = b[3], ymax = b[4],
  crs = sf::st_crs(4326)
))

ssurgo.geom <- soilDB::SDA_spatialQuery(
  bbox.sp,
  what = 'mupolygon',
  db = 'SSURGO',
  geomIntersection = TRUE
)

# SoilProfileCollection output, using linear interpolation for 1cm slices
# site-level variables (e.g. resdept) added to site data.frame of SPC
res <- fetchSOLUS(
  ssurgo.geom,
  depth_slices = c("0", "5", "15", "30", "60", "100", "150"),
  variables = c("sandtotal", "silttotal", "claytotal", "cec7", "resdept"),
  output_type = "prediction",
  method = "linear",
  grid = FALSE,
  samples = 10
)
#> Loading required namespace: terra
#> converting profile IDs from integer to character
# plot, truncating each profile to the predicted restriction depth
plot(trunc(res, 0, res$resdept_p), color = "claytotal_p", divide.hz = FALSE)

@brownag brownag marked this pull request as ready for review October 15, 2024 22:19
@brownag
Copy link
Member Author

brownag commented Oct 15, 2024

It seems that a decent proportion of the time (~30% of claytotal_p cells in raster extent of above example) the soil property has NA at a shallower depth than the predicted restriction depth. As far as I can tell this is the way it is presented in SOLUS.

Making assumption about a step-wise continuous property depth distribution (constant value to next depth increment, and max of 200cm) using method="step" also "resolves" the issue.

This PR is pretty functional now and ready for review. I am still open to changing around arguments and default behaviors. I want to try it on various extents a little more and make sure I have caught edge cases that might be out there


library(aqp)
#> This is aqp 2.1.0
library(terra)
#> terra 1.7.81
library(soilDB)

b <- c(-119.747629, -119.67935, 36.912019, 36.944987)

bbox.sp <- sf::st_as_sf(wk::rct(
  xmin = b[1], xmax = b[2], ymin = b[3], ymax = b[4],
  crs = sf::st_crs(4326)
))

ssurgo.geom <- soilDB::SDA_spatialQuery(
  bbox.sp,
  what = 'mupolygon',
  db = 'SSURGO',
  geomIntersection = TRUE
)


ssurgo.geom2<- merge(ssurgo.geom, get_SDA_muaggatt(mukeys = unique(ssurgo.geom$mukey)), by = "mukey")
#> single result set, returning a data.frame

# SoilProfileCollection output, using linear interpolation for 1cm slices
# site-level variables (e.g. resdept) added to site data.frame of SPC
res <- fetchSOLUS(
  ssurgo.geom,
  depth_slices = c("0", "5", "15", "30", "60", "100", "150"),
  variables = c("sandtotal", "silttotal", "claytotal", "cec7", "resdept"),
  output_type = "prediction",
  method = "linear",
  grid = NULL
)
#> converting profile IDs from integer to character
aqp::prj(res$spc) <- "EPSG:5070"

foo <- res$spc
foo$clay_na <- ifelse(is.na(foo$claytotal_p), "TRUE", "FALSE")
foo$na_depth <- minDepthOf(foo, "TRUE", hzdesgn = "clay_na", no.contact.assigned = 151)$top
foo$na_depth2 <- factor(foo$na_depth)
boxplot(foo$resdept_p ~ foo$na_depth2,
        xlab = "Depth to NA claytotal_p, cm",
        ylab = "Restriction Depth (predicted), cm")
prop.table(sort(table(foo$resdept_p > foo$na_depth, 
                      useNA = "ifany")))
#> 
#>     TRUE    FALSE 
#> 0.298437 0.701563

@brownag
Copy link
Member Author

brownag commented Oct 18, 2024

9a7e193 changes the behavior when x is a SpatRaster. Instead of using terra::as.polygons() to convert a raster to extent of interest, x is used as a template for GDAL warper. This allows you to return the SOLUS results in arbitrary CRS and resolution (matching the input object).

For example 0cm clay content, Interrupted Goode Homolosine, 800m resolution:

library(soilDB)
library(terra)
#> terra 1.7.81

data(us_states, package="spData")
b <- vect(subset(sf::st_as_sf(us_states), NAME %in% c("Arizona", "New Mexico")))

template <- rast(crs = "+proj=igh",
                 extent = ext(project(b, "+proj=igh")),
                 resolution = 800)

s <- fetchSOLUS(template, 
                depth_slices = "0", 
                variables = "claytotal", 
                output_type = "prediction")
s
#> class       : SpatRaster 
#> dimensions  : 789, 1405, 1  (nrow, ncol, nlyr)
#> resolution  : 800, 800  (x, y)
#> extent      : -12522827, -11398827, 3487889, 4119089  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=igh +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#> source(s)   : memory
#> name        : claytotal_0_cm_p 
#> min value   :        0.1470945 
#> max value   :       47.6363792
plot(s)

@brownag
Copy link
Member Author

brownag commented Oct 23, 2024

  • Scaling factors are now also applied to full extent virtual raster results and subsets thereof via terra::scoff().
  • Added some checks for edge cases and errors

This is everything I wanted to get working in this PR.

A future PR may introduce mpspline2-based spline methods (e.g. in this branch https://github.com/ncss-tech/soilDB/tree/fetchSOLUS-mpspline) or, alternately, a way to specify custom interpolation methods via a function

@brownag brownag merged commit 98bf43e into master Oct 23, 2024
4 of 5 checks passed
@brownag brownag deleted the fetchSOLUS branch October 23, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants