Skip to content

Commit

Permalink
Merge pull request #301 from ncss-tech/esagg
Browse files Browse the repository at this point in the history
get_SDA_coecoclass: add `method="all"`
  • Loading branch information
brownag authored Oct 28, 2023
2 parents 728019a + f69cecb commit ec9e21c
Show file tree
Hide file tree
Showing 5 changed files with 225 additions and 10 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: soilDB
Type: Package
Title: Soil Database Interface
Version: 2.7.9
Version: 2.7.9.9000
Authors@R: c(person(given="Dylan", family="Beaudette", role = c("aut"), email = "[email protected]"),
person(given="Jay", family="Skovlin", role = c("aut")),
person(given="Stephen", family="Roecker", role = c("aut")),
Expand Down
6 changes: 4 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# soilDB 2.7.9.9000 (2023-10-04)
- `fetchSDA_spatial()` gains `geom.src="mlrapolygon"` for obtaining Major Land Resource Area (MLRA) polygon boundaries. When using this geometry source `x` is a vector of `MLRARSYM` (MLRA Symbols).

- The geometry source is the MLRA Geographic Database v5.2 (2022) which is not (yet) part of Soil Data Access. Instead of SDA, GDAL utilities are used to read a zipped ESRI Shapefile from a remote URL: <https://www.nrcs.usda.gov/sites/default/files/2022-10/MLRA_52_2022.zip>. Therefore, most additional `fetchSDA_spatial()` arguments are _not_ currently supported for the MLRA geometry source. In the future a `mlrapolygon` table may be added to SDA (analogous to `mupolygon` and `sapolygon`), and the function will be updated accordingly at that time.
- The geometry source is the MLRA Geographic Database v5.2 (2022) which is not (yet) part of Soil Data Access. Instead of SDA, GDAL utilities are used to read a zipped ESRI Shapefile from a remote URL: <https://www.nrcs.usda.gov/sites/default/files/2022-10/MLRA_52_2022.zip>. Therefore, most additional `fetchSDA_spatial()` arguments are _not_ currently supported for the MLRA geometry source. In the future a `mlrapolygon` table may be added to SDA (analogous to `mupolygon` and `sapolygon`), and the function will be updated accordingly at that time.

- web coverage services and related raster attribute tables provided by SoilWeb (`mukey.wcs()` etc.) are now using the load-balancer URL
- Web coverage services and related raster attribute tables provided by SoilWeb (`mukey.wcs()` etc.) are now using the SoilWeb load-balancer URL

- `get_SDA_coecoclass()` gains `method="all"` for aggregating information about ecological sites and related components. The method performs a condition-based aggregation for each ecological site condition in the map unit, producing a "wide" data.frame result with as many columns as needed to portray all site conditions.

# soilDB 2.7.9 (2023-09-01)

Expand Down
32 changes: 32 additions & 0 deletions R/SSURGO_query.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#' Query arbitrary data sources that use the SSURGO data model
#'
#' This is a simple wrapper function allowing queries to be passed to a variety of database drivers. It is assumed the database generally follows the SSURGO schema, regardless of the driver being used.
#'
#' @param x An SQL query. If `dsn` is NULL `x` is in T-SQL dialect. If `dsn` is a _character_ (file path), the SQLite dialect is used. If `dsn` is a `DBIConnection`, any SQL dialect compatible with the DBI source can be used.
#' @param dsn Default: `NULL` uses Soil Data Access remote data source via REST API. Alternately `dsn` may be a _character_ file path to an SQLite database, or a `DBIConnection` that has already been created.
#'
#' @details No processing of the query string is performed by this function, so all values of `x` must be adjusted according to the value of `dsn`.
#'
#' @return A _data.frame_, or _try-error_ on error
#' @noRd
.SSURGO_query <- function(x, dsn = NULL) {
if (is.null(dsn)) {
return(SDA_query(x))
} else {
if (inherits(dsn, 'DBIConnection')) {
return(DBI::dbGetQuery(con, x))
} else if (file.exists(dsn)) {
if (requireNamespace("RSQLite")) {
con <- try(RSQLite::dbConnect(RSQLite::SQLite(), dsn))
on.exit(DBI::dbDisconnect(con), add = TRUE)
if (!inherits(con, 'try-error')) {
return(RSQLite::dbGetQuery(con, x))
} else {
return(invisible(con))
}
}
} else {
stop("Invalid data source name: ", dsn, call. = FALSE)
}
}
}
186 changes: 181 additions & 5 deletions R/get_SDA_coecoclass.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
#' Get mapunit ecological sites from Soil Data Access
#'
#' @details When `method="Dominant Condition"` an additional field `ecoclasspct_r` is returned in the result with the sum of `comppct_r` that have the dominant condition `ecoclassid`. The component with the greatest `comppct_r` is returned for the `component` and `coecosite` level information.
#' @details When `method="Dominant Condition"` an additional field `ecoclasspct_r` is returned in the result
#' with the sum of `comppct_r` that have the dominant condition `ecoclassid`. The component with the greatest
#' `comppct_r` is returned for the `component` and `coecosite` level information.
#'
#' Note that if there are multiple `coecoclasskey` per `ecoclassid` there may be more than one record per component.
#'
#' @param method aggregation method. One of: "Dominant Component", "Dominant Condition", "None". If "None" is selected one row will be returned per component, otherwise one row will be returned per map unit.
#' @param method aggregation method. One of: `"Dominant Component"`, `"Dominant Condition"`, `"All"` or `"None"` (default). If `method="all"` multiple numbered columns represent site composition within each map unit e.g. `site1...`, `site2...`. If `method="none"` is selected one row will be returned per _component_; in all other cases one row will be returned per _map unit_.
#' @param areasymbols vector of soil survey area symbols
#' @param mukeys vector of map unit keys
#' @param WHERE character containing SQL WHERE clause specified in terms of fields in `legend`, `mapunit`, `component` or `coecosite` tables, used in lieu of `mukeys` or `areasymbols`
#' @param query_string Default: `FALSE`; if `TRUE` return a character string containing query that would be sent to SDA via `SDA_query`
#' @param ecoclasstypename If `NULL` no constraint on `ecoclasstypename` is used in the query.
#' @param ecoclassref Default: `"Ecological Site Description Database"`. If `NULL` no constraint on `ecoclassref` is used in the query.
#' @param not_rated_value Default: `"Not assigned"`
#' @param include_minors logical. Include minor components? Default: `TRUE`.
#' @param miscellaneous_areas logical. Include miscellaneous areas (non-soil components)?
#' @param include_minors logical. Include minor components? Default: `TRUE`.
#' @param threshold integer. Default: `0`. Minimum combined component percentage (RV) for inclusion of a mapunit's ecological site in wide-format tabular sumamry. Used only for `method="all"`.
#' @param dsn Path to local SQLite database or a DBIConnection object. If `NULL` (default) use Soil Data Access API via `SDA_query()`.
#' @export
get_SDA_coecoclass <- function(method = "None",
Expand All @@ -24,8 +26,30 @@ get_SDA_coecoclass <- function(method = "None",
not_rated_value = "Not assigned",
miscellaneous_areas = TRUE,
include_minors = TRUE,
threshold = 0,
dsn = NULL) {
method <- match.arg(toupper(method), c('NONE', "DOMINANT COMPONENT", "DOMINANT CONDITION"))

method <- match.arg(toupper(method), c("NONE", "ALL", "DOMINANT COMPONENT", "DOMINANT CONDITION"))

if (method == "ALL") {
if (!is.null(WHERE)) {
stop("`method='all'` does not support custom `WHERE` clauses", call. = FALSE)
}

if (isTRUE(query_string)) {
stop("`method='all'` does not support `query_string=TRUE`", call. = FALSE)
}

return(.get_SDA_coecoclass_agg(areasymbols = areasymbols,
mukeys = mukeys,
ecoclasstypename = ecoclasstypename,
ecoclassref = ecoclassref,
not_rated_value = not_rated_value,
miscellaneous_areas = miscellaneous_areas,
include_minors = include_minors,
threshold = threshold,
dsn = dsn))
}

if (!is.null(ecoclassref)) {
ecoclassref_in <- soilDB::format_SQL_in_statement(ecoclassref)
Expand Down Expand Up @@ -122,3 +146,155 @@ get_SDA_coecoclass <- function(method = "None",
res2
}

.get_SDA_coecoclass_agg <- function(areasymbols = NULL,
mukeys = NULL,
ecoclasstypename = NULL,
ecoclassref = "Ecological Site Description Database",
not_rated_value = "Not assigned",
miscellaneous_areas = TRUE,
include_minors = TRUE,
dsn = NULL,
threshold = 0) {

comppct_r <- NULL; condpct_r <- NULL; compname <- NULL; ecoclasstypename <- NULL
areasymbol <- NULL; compnames <- NULL; unassigned <- NULL;
mukey <- NULL; .N <- NULL; .SD <- NULL; .GRP <- NULL;

if (!is.null(areasymbols)) {
res0 <- do.call('rbind', lapply(areasymbols, function(x) {
.SSURGO_query(paste0(
"SELECT DISTINCT mukey, nationalmusym, muname FROM mapunit
INNER JOIN legend ON legend.lkey = mapunit.lkey
WHERE areasymbol = '", x, "'"
))
}))
idx <- makeChunks(res0$mukey, 1000)
l <- split(res0$mukey, idx)
} else {
idx <- makeChunks(mukeys, 1000)
l <- split(mukeys, idx)
res0 <- do.call('rbind', lapply(l, function(x) {
.SSURGO_query(paste0(
"SELECT DISTINCT mukey, nationalmusym, muname FROM mapunit
INNER JOIN legend ON legend.lkey = mapunit.lkey
WHERE mukey IN ", format_SQL_in_statement(x), ""
))
}))
idx <- makeChunks(res0$mukey, 1000)
l <- split(res0$mukey, idx)
}

res1 <- do.call('rbind', lapply(l, function(x) {
get_SDA_coecoclass(
mukeys = x,
method = "None",
ecoclasstypename = ecoclasstypename,
ecoclassref = ecoclassref,
not_rated_value = not_rated_value,
miscellaneous_areas = miscellaneous_areas,
include_minors = include_minors,
dsn = dsn
)
}))

res1 <- data.table::data.table(res1)[, .SD[order(comppct_r, decreasing = TRUE), ], by = "mukey"]
res2 <- data.table::data.table(subset(res1, areasymbol != "US"))

# remove FSG etc. some components have no ES assigned, but have other eco class
idx <- !res2$ecoclassref %in% c(not_rated_value, "Not assigned", "Ecological Site Description Database") &
!res2$ecoclasstypename %in% c(not_rated_value, "Not assigned", "NRCS Rangeland Site", "NRCS Forestland Site")

res2$ecoclassid[idx] <- not_rated_value
res2$ecoclassref[idx] <- not_rated_value
res2$ecoclassname[idx] <- not_rated_value
res2$ecoclasstypename[idx] <- not_rated_value

res3 <- res2[, list(
condpct_r = sum(comppct_r, na.rm = TRUE),
compnames = paste0(compname[ecoclasstypename %in% c("NRCS Rangeland Site",
"NRCS Forestland Site")],
collapse = ", "),
unassigned = paste0(compname[!ecoclasstypename %in% c("NRCS Rangeland Site",
"NRCS Forestland Site")],
collapse = ", ")
), by = c("mukey", "ecoclassid", "ecoclassname")][, rbind(
.SD[, 1:4],
data.frame(
ecoclassid = not_rated_value,
ecoclassname = not_rated_value,
condpct_r = 100 - sum(condpct_r[nchar(compnames) > 0], na.rm = TRUE),
compnames = paste0(unassigned[nchar(unassigned) > 0 & !unassigned %in% compnames],
collapse = ",")
)
), by = c("mukey")][order(mukey, -condpct_r), ]
res3 <- res3[nchar(res3$compnames) > 0,]

# could do up to max_sites, but generally cut to some minimum condition percentage `threshold`
max_sites <- suppressWarnings(max(res3[, .N, by = "mukey"]$N))
res3 <- res3[res3$condpct_r >= threshold, ]
max_sites_pruned <- suppressWarnings(max(res3[, .N, by = "mukey"]$N))
res3$condpct_r <- as.integer(res3$condpct_r)

if (max_sites > max_sites_pruned) {
message("maximum number of sites per mukey: ", max_sites)
message("using maximum number of sites above threshold (",
threshold, "%) per mukey: ", max_sites_pruned)
}

if (!is.finite(max_sites_pruned)) {
sdx <- 1
} else {
sdx <- seq(max_sites_pruned)
}

.coecoclass_long_to_wide <- function(x, group) {
res <- data.frame(grpid = group)
for (i in sdx) {
if (i > nrow(x)) {
d <- data.frame(
siten = NA_character_,
sitenname = NA_character_,
sitencompname = NA_character_,
sitenpct_r = NA_integer_,
sitenlink = NA_character_
)
} else {
d <- data.frame(
siten = ifelse(isTRUE(is.na(x$ecoclassid[i])), not_rated_value, x$ecoclassid[i]),
sitenname = ifelse(isTRUE(is.na(x$ecoclassname[i])), not_rated_value, x$ecoclassname[i]),
sitencompname = ifelse(isTRUE(is.na(x$compnames[i])), NA_character_, x$compnames[i]),
sitenpct_r = ifelse(isTRUE(is.na(x$condpct_r[i])), 0, x$condpct_r[i]),
sitenlink = ifelse(
isTRUE(x$ecoclassid[i] == not_rated_value | is.na(x$ecoclassid[i])),
NA_character_,
paste0(
"https://edit.jornada.nmsu.edu/catalogs/esd/",
substr(x$ecoclassid[i], 2, 5),
"/",
x$ecoclassid[i]
)
)
)
# sitenpdf = ifelse(isTRUE(x$ecoclassid[i] == "Not assigned"), NA_character_,
# paste0(
# "https://edit.jornada.nmsu.edu/services/descriptions/esd/",
# substr(x$ecoclassid[i], 2, 5), "/", x$ecoclassid[i], ".pdf"
# )))
}
colnames(d) <- c(
paste0("site", i), paste0("site", i, "name"), paste0("site", i, "compname"),
paste0("site", i, "pct_r"), paste0("site", i, "link")
)
res <- cbind(res, d)
}
res
}

res <- merge(data.table::data.table(res0), res3, by = "mukey", all.x = TRUE)[,
.coecoclass_long_to_wide(.SD, .GRP),
by = c("mukey", "muname", "nationalmusym"), ]

res$grpid <- NULL
res
}

9 changes: 7 additions & 2 deletions man/get_SDA_coecoclass.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit ec9e21c

Please sign in to comment.