Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_SDA_coecoclass: add method="all" #301

Merged
merged 6 commits into from
Oct 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: soilDB
Type: Package
Title: Soil Database Interface
Version: 2.7.9
Version: 2.7.9.9000
Authors@R: c(person(given="Dylan", family="Beaudette", role = c("aut"), email = "[email protected]"),
person(given="Jay", family="Skovlin", role = c("aut")),
person(given="Stephen", family="Roecker", role = c("aut")),
Expand Down
6 changes: 4 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# soilDB 2.7.9.9000 (2023-10-04)
- `fetchSDA_spatial()` gains `geom.src="mlrapolygon"` for obtaining Major Land Resource Area (MLRA) polygon boundaries. When using this geometry source `x` is a vector of `MLRARSYM` (MLRA Symbols).

- The geometry source is the MLRA Geographic Database v5.2 (2022) which is not (yet) part of Soil Data Access. Instead of SDA, GDAL utilities are used to read a zipped ESRI Shapefile from a remote URL: <https://www.nrcs.usda.gov/sites/default/files/2022-10/MLRA_52_2022.zip>. Therefore, most additional `fetchSDA_spatial()` arguments are _not_ currently supported for the MLRA geometry source. In the future a `mlrapolygon` table may be added to SDA (analogous to `mupolygon` and `sapolygon`), and the function will be updated accordingly at that time.
- The geometry source is the MLRA Geographic Database v5.2 (2022) which is not (yet) part of Soil Data Access. Instead of SDA, GDAL utilities are used to read a zipped ESRI Shapefile from a remote URL: <https://www.nrcs.usda.gov/sites/default/files/2022-10/MLRA_52_2022.zip>. Therefore, most additional `fetchSDA_spatial()` arguments are _not_ currently supported for the MLRA geometry source. In the future a `mlrapolygon` table may be added to SDA (analogous to `mupolygon` and `sapolygon`), and the function will be updated accordingly at that time.

- web coverage services and related raster attribute tables provided by SoilWeb (`mukey.wcs()` etc.) are now using the load-balancer URL
- Web coverage services and related raster attribute tables provided by SoilWeb (`mukey.wcs()` etc.) are now using the SoilWeb load-balancer URL

- `get_SDA_coecoclass()` gains `method="all"` for aggregating information about ecological sites and related components. The method performs a condition-based aggregation for each ecological site condition in the map unit, producing a "wide" data.frame result with as many columns as needed to portray all site conditions.

# soilDB 2.7.9 (2023-09-01)

Expand Down
32 changes: 32 additions & 0 deletions R/SSURGO_query.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#' Query arbitrary data sources that use the SSURGO data model
#'
#' This is a simple wrapper function allowing queries to be passed to a variety of database drivers. It is assumed the database generally follows the SSURGO schema, regardless of the driver being used.
#'
#' @param x An SQL query. If `dsn` is NULL `x` is in T-SQL dialect. If `dsn` is a _character_ (file path), the SQLite dialect is used. If `dsn` is a `DBIConnection`, any SQL dialect compatible with the DBI source can be used.
#' @param dsn Default: `NULL` uses Soil Data Access remote data source via REST API. Alternately `dsn` may be a _character_ file path to an SQLite database, or a `DBIConnection` that has already been created.
#'
#' @details No processing of the query string is performed by this function, so all values of `x` must be adjusted according to the value of `dsn`.
#'
#' @return A _data.frame_, or _try-error_ on error
#' @noRd
.SSURGO_query <- function(x, dsn = NULL) {
if (is.null(dsn)) {
return(SDA_query(x))
} else {
if (inherits(dsn, 'DBIConnection')) {
return(DBI::dbGetQuery(con, x))
} else if (file.exists(dsn)) {
if (requireNamespace("RSQLite")) {
con <- try(RSQLite::dbConnect(RSQLite::SQLite(), dsn))
on.exit(DBI::dbDisconnect(con), add = TRUE)
if (!inherits(con, 'try-error')) {
return(RSQLite::dbGetQuery(con, x))
} else {
return(invisible(con))
}
}
} else {
stop("Invalid data source name: ", dsn, call. = FALSE)
}
}
}
186 changes: 181 additions & 5 deletions R/get_SDA_coecoclass.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
#' Get mapunit ecological sites from Soil Data Access
#'
#' @details When `method="Dominant Condition"` an additional field `ecoclasspct_r` is returned in the result with the sum of `comppct_r` that have the dominant condition `ecoclassid`. The component with the greatest `comppct_r` is returned for the `component` and `coecosite` level information.
#' @details When `method="Dominant Condition"` an additional field `ecoclasspct_r` is returned in the result
#' with the sum of `comppct_r` that have the dominant condition `ecoclassid`. The component with the greatest
#' `comppct_r` is returned for the `component` and `coecosite` level information.
#'
#' Note that if there are multiple `coecoclasskey` per `ecoclassid` there may be more than one record per component.
#'
#' @param method aggregation method. One of: "Dominant Component", "Dominant Condition", "None". If "None" is selected one row will be returned per component, otherwise one row will be returned per map unit.
#' @param method aggregation method. One of: `"Dominant Component"`, `"Dominant Condition"`, `"All"` or `"None"` (default). If `method="all"` multiple numbered columns represent site composition within each map unit e.g. `site1...`, `site2...`. If `method="none"` is selected one row will be returned per _component_; in all other cases one row will be returned per _map unit_.
#' @param areasymbols vector of soil survey area symbols
#' @param mukeys vector of map unit keys
#' @param WHERE character containing SQL WHERE clause specified in terms of fields in `legend`, `mapunit`, `component` or `coecosite` tables, used in lieu of `mukeys` or `areasymbols`
#' @param query_string Default: `FALSE`; if `TRUE` return a character string containing query that would be sent to SDA via `SDA_query`
#' @param ecoclasstypename If `NULL` no constraint on `ecoclasstypename` is used in the query.
#' @param ecoclassref Default: `"Ecological Site Description Database"`. If `NULL` no constraint on `ecoclassref` is used in the query.
#' @param not_rated_value Default: `"Not assigned"`
#' @param include_minors logical. Include minor components? Default: `TRUE`.
#' @param miscellaneous_areas logical. Include miscellaneous areas (non-soil components)?
#' @param include_minors logical. Include minor components? Default: `TRUE`.
#' @param threshold integer. Default: `0`. Minimum combined component percentage (RV) for inclusion of a mapunit's ecological site in wide-format tabular sumamry. Used only for `method="all"`.
#' @param dsn Path to local SQLite database or a DBIConnection object. If `NULL` (default) use Soil Data Access API via `SDA_query()`.
#' @export
get_SDA_coecoclass <- function(method = "None",
Expand All @@ -24,8 +26,30 @@ get_SDA_coecoclass <- function(method = "None",
not_rated_value = "Not assigned",
miscellaneous_areas = TRUE,
include_minors = TRUE,
threshold = 0,
dsn = NULL) {
method <- match.arg(toupper(method), c('NONE', "DOMINANT COMPONENT", "DOMINANT CONDITION"))

method <- match.arg(toupper(method), c("NONE", "ALL", "DOMINANT COMPONENT", "DOMINANT CONDITION"))

if (method == "ALL") {
if (!is.null(WHERE)) {
stop("`method='all'` does not support custom `WHERE` clauses", call. = FALSE)
}

if (isTRUE(query_string)) {
stop("`method='all'` does not support `query_string=TRUE`", call. = FALSE)
}

return(.get_SDA_coecoclass_agg(areasymbols = areasymbols,
mukeys = mukeys,
ecoclasstypename = ecoclasstypename,
ecoclassref = ecoclassref,
not_rated_value = not_rated_value,
miscellaneous_areas = miscellaneous_areas,
include_minors = include_minors,
threshold = threshold,
dsn = dsn))
}

if (!is.null(ecoclassref)) {
ecoclassref_in <- soilDB::format_SQL_in_statement(ecoclassref)
Expand Down Expand Up @@ -122,3 +146,155 @@ get_SDA_coecoclass <- function(method = "None",
res2
}

.get_SDA_coecoclass_agg <- function(areasymbols = NULL,
mukeys = NULL,
ecoclasstypename = NULL,
ecoclassref = "Ecological Site Description Database",
not_rated_value = "Not assigned",
miscellaneous_areas = TRUE,
include_minors = TRUE,
dsn = NULL,
threshold = 0) {

comppct_r <- NULL; condpct_r <- NULL; compname <- NULL; ecoclasstypename <- NULL
areasymbol <- NULL; compnames <- NULL; unassigned <- NULL;
mukey <- NULL; .N <- NULL; .SD <- NULL; .GRP <- NULL;

if (!is.null(areasymbols)) {
res0 <- do.call('rbind', lapply(areasymbols, function(x) {
.SSURGO_query(paste0(
"SELECT DISTINCT mukey, nationalmusym, muname FROM mapunit
INNER JOIN legend ON legend.lkey = mapunit.lkey
WHERE areasymbol = '", x, "'"
))
}))
idx <- makeChunks(res0$mukey, 1000)
l <- split(res0$mukey, idx)
} else {
idx <- makeChunks(mukeys, 1000)
l <- split(mukeys, idx)
res0 <- do.call('rbind', lapply(l, function(x) {
.SSURGO_query(paste0(
"SELECT DISTINCT mukey, nationalmusym, muname FROM mapunit
INNER JOIN legend ON legend.lkey = mapunit.lkey
WHERE mukey IN ", format_SQL_in_statement(x), ""
))
}))
idx <- makeChunks(res0$mukey, 1000)
l <- split(res0$mukey, idx)
}

res1 <- do.call('rbind', lapply(l, function(x) {
get_SDA_coecoclass(
mukeys = x,
method = "None",
ecoclasstypename = ecoclasstypename,
ecoclassref = ecoclassref,
not_rated_value = not_rated_value,
miscellaneous_areas = miscellaneous_areas,
include_minors = include_minors,
dsn = dsn
)
}))

res1 <- data.table::data.table(res1)[, .SD[order(comppct_r, decreasing = TRUE), ], by = "mukey"]
res2 <- data.table::data.table(subset(res1, areasymbol != "US"))

# remove FSG etc. some components have no ES assigned, but have other eco class
idx <- !res2$ecoclassref %in% c(not_rated_value, "Not assigned", "Ecological Site Description Database") &
!res2$ecoclasstypename %in% c(not_rated_value, "Not assigned", "NRCS Rangeland Site", "NRCS Forestland Site")

res2$ecoclassid[idx] <- not_rated_value
res2$ecoclassref[idx] <- not_rated_value
res2$ecoclassname[idx] <- not_rated_value
res2$ecoclasstypename[idx] <- not_rated_value

res3 <- res2[, list(
condpct_r = sum(comppct_r, na.rm = TRUE),
compnames = paste0(compname[ecoclasstypename %in% c("NRCS Rangeland Site",
"NRCS Forestland Site")],
collapse = ", "),
unassigned = paste0(compname[!ecoclasstypename %in% c("NRCS Rangeland Site",
"NRCS Forestland Site")],
collapse = ", ")
), by = c("mukey", "ecoclassid", "ecoclassname")][, rbind(
.SD[, 1:4],
data.frame(
ecoclassid = not_rated_value,
ecoclassname = not_rated_value,
condpct_r = 100 - sum(condpct_r[nchar(compnames) > 0], na.rm = TRUE),
compnames = paste0(unassigned[nchar(unassigned) > 0 & !unassigned %in% compnames],
collapse = ",")
)
), by = c("mukey")][order(mukey, -condpct_r), ]
res3 <- res3[nchar(res3$compnames) > 0,]

# could do up to max_sites, but generally cut to some minimum condition percentage `threshold`
max_sites <- suppressWarnings(max(res3[, .N, by = "mukey"]$N))
res3 <- res3[res3$condpct_r >= threshold, ]
max_sites_pruned <- suppressWarnings(max(res3[, .N, by = "mukey"]$N))
res3$condpct_r <- as.integer(res3$condpct_r)

if (max_sites > max_sites_pruned) {
message("maximum number of sites per mukey: ", max_sites)
message("using maximum number of sites above threshold (",
threshold, "%) per mukey: ", max_sites_pruned)
}

if (!is.finite(max_sites_pruned)) {
sdx <- 1
} else {
sdx <- seq(max_sites_pruned)
}

.coecoclass_long_to_wide <- function(x, group) {
res <- data.frame(grpid = group)
for (i in sdx) {
if (i > nrow(x)) {
d <- data.frame(
siten = NA_character_,
sitenname = NA_character_,
sitencompname = NA_character_,
sitenpct_r = NA_integer_,
sitenlink = NA_character_
)
} else {
d <- data.frame(
siten = ifelse(isTRUE(is.na(x$ecoclassid[i])), not_rated_value, x$ecoclassid[i]),
sitenname = ifelse(isTRUE(is.na(x$ecoclassname[i])), not_rated_value, x$ecoclassname[i]),
sitencompname = ifelse(isTRUE(is.na(x$compnames[i])), NA_character_, x$compnames[i]),
sitenpct_r = ifelse(isTRUE(is.na(x$condpct_r[i])), 0, x$condpct_r[i]),
sitenlink = ifelse(
isTRUE(x$ecoclassid[i] == not_rated_value | is.na(x$ecoclassid[i])),
NA_character_,
paste0(
"https://edit.jornada.nmsu.edu/catalogs/esd/",
substr(x$ecoclassid[i], 2, 5),
"/",
x$ecoclassid[i]
)
)
)
# sitenpdf = ifelse(isTRUE(x$ecoclassid[i] == "Not assigned"), NA_character_,
# paste0(
# "https://edit.jornada.nmsu.edu/services/descriptions/esd/",
# substr(x$ecoclassid[i], 2, 5), "/", x$ecoclassid[i], ".pdf"
# )))
}
colnames(d) <- c(
paste0("site", i), paste0("site", i, "name"), paste0("site", i, "compname"),
paste0("site", i, "pct_r"), paste0("site", i, "link")
)
res <- cbind(res, d)
}
res
}

res <- merge(data.table::data.table(res0), res3, by = "mukey", all.x = TRUE)[,
.coecoclass_long_to_wide(.SD, .GRP),
by = c("mukey", "muname", "nationalmusym"), ]

res$grpid <- NULL
res
}

9 changes: 7 additions & 2 deletions man/get_SDA_coecoclass.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.