-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to get_SDA_*
/ SOD-like methods
#185
Conversation
…cleanRuleColumnName()
… names in results
Thanks @dylanbeaudette for the We also trigger some similar ISNULL type logic in the soilDB/R/SDA_interpretations.R Line 1155 in e8cc5d5
soilDB/R/SDA_interpretations.R Line 1169 in e8cc5d5
soilDB/R/SDA_interpretations.R Line 1176 in e8cc5d5
|
Here is a quick review of NULL logic for Weighted average of an interpretation rating: library(soilDB)
library(dplyr, warn.conflicts = FALSE)
res_wtdavg <- get_SDA_interpretation("FOR - Potential Seedling Mortality",
method = "Weighted Average",
mukeys = 2424959)
res_wtdavg
#> areasymbol musym muname mukey
#> 1 CA630 8317 Beybek-Rock outcrop complex, 3 to 30 percent slopes 2424959
#> rating_FORPotentialSeedlingMortality class_FORPotentialSeedlingMortality
#> 1 0.23 Slightly limited
#> reason_FORPotentialSeedlingMortality
#> 1 Available water No aggregation: res_noagg <- get_SDA_interpretation("FOR - Potential Seedling Mortality",
method = "None",
mukeys = 2424959)
res_noagg
#> areasymbol musym muname mukey
#> 1 CA630 8317 Beybek-Rock outcrop complex, 3 to 30 percent slopes 2424959
#> 2 CA630 8317 Beybek-Rock outcrop complex, 3 to 30 percent slopes 2424959
#> 3 CA630 8317 Beybek-Rock outcrop complex, 3 to 30 percent slopes 2424959
#> 4 CA630 8317 Beybek-Rock outcrop complex, 3 to 30 percent slopes 2424959
#> cokey compname comppct_r rating_FORPotentialSeedlingMortality
#> 1 19586584 Beybek 45 0.0
#> 2 19586585 Millvilla 10 0.5
#> 3 19586586 Rock outcrop 35 NA
#> 4 19586587 Mollic Haploxeralfs 10 1.0
#> class_FORPotentialSeedlingMortality reason_FORPotentialSeedlingMortality
#> 1 Low <NA>
#> 2 Moderate Available water
#> 3 Not rated <NA>
#> 4 High Available water If all component ratings are considered, ignoring missing values, the weighted average cannot be calculated (is res_noagg %>%
group_by(mukey) %>%
summarize(sum(comppct_r / 100 * rating_FORPotentialSeedlingMortality))
#> # A tibble: 1 x 2
#> mukey `sum(comppct_r/100 * rating_FORPotentialSeedlingMortality)`
#> <int> <dbl>
#> 1 2424959 NA If res_noagg %>%
group_by(mukey) %>%
summarize(sum(comppct_r/100 * rating_FORPotentialSeedlingMortality, na.rm = TRUE))
#> # A tibble: 1 x 2
#> mukey `sum(comppct_r/100 * rating_FORPotentialSeedlingMortality, na.rm = TR~
#> <int> <dbl>
#> 1 2424959 0.15 Assuming 100% only makes sense for things can be conceived of as "stocks" (on an area/volume basis) within a mapunit (i.e. does not make sense for interpretation ratings) and you want to "dilute" the result accordingly for misc areas/volumes that do not have soil. In contrast when res_noagg %>%
group_by(mukey) %>%
filter(!is.na(comppct_r) & !is.na(rating_FORPotentialSeedlingMortality)) %>%
summarize(sum(comppct_r / sum(comppct_r) * rating_FORPotentialSeedlingMortality, na.rm=TRUE))
#> # A tibble: 1 x 2
#> mukey `sum(...)`
#> <int> <dbl>
#> 1 2424959 0.231 Finally, if all components are not rated, we obtain a value of "99" for the rating (domain is [0,1] for valid values) The only thing I would consider changing here is that on the R side we could convert these values >1 to res_99 <- get_SDA_interpretation("FOR - Potential Seedling Mortality",
method = "Weighted Average",
areasymbols = "CA630")
subset(res_99, rating_FORPotentialSeedlingMortality == 99.0)
#> areasymbol musym muname mukey rating_FORPotentialSeedlingMortality
#> 90 CA630 1012 Mined Land 2403709 99
#> 92 CA630 DAM Dams 2924912 99
#> 104 CA630 W Water 2462630 99
#> class_FORPotentialSeedlingMortality reason_FORPotentialSeedlingMortality
#> 90 Not Rated <NA>
#> 92 Not Rated <NA>
#> 104 Not Rated <NA> Note: The rating returns get_SDA_interpretation("FOR - Potential Seedling Mortality",
method = "NONE",
mukeys = 2924912)
#> areasymbol musym muname mukey cokey compname comppct_r
#> 1 CA630 DAM Dams 2924912 19586041 Dams 100
#> rating_FORPotentialSeedlingMortality class_FORPotentialSeedlingMortality
#> 1 NA Not rated
#> reason_FORPotentialSeedlingMortality
#> 1 NA |
I propose that we use the new argument added in 2774148 "not_rated_value" to give the user control over what the numeric values associated with "Not Rated" records are. I suggest the default value to be library(soilDB)
# default will use NA_real_ for the numeric ratings of "Not rated" components
res1 <- get_SDA_interpretation("FOR - Potential Seedling Mortality",
method = "Weighted Average",
areasymbols = "CA630")
head(subset(res1, class_FORPotentialSeedlingMortality == "Not Rated"))
#> areasymbol musym muname mukey rating_FORPotentialSeedlingMortality
#> 90 CA630 1012 Mined Land 2403709 NA
#> 92 CA630 DAM Dams 2924912 NA
#> 104 CA630 W Water 2462630 NA
#> class_FORPotentialSeedlingMortality reason_FORPotentialSeedlingMortality
#> 90 Not Rated <NA>
#> 92 Not Rated <NA>
#> 104 Not Rated <NA>
# user can specify a custom not rated value, e.g. 9999
res2 <- get_SDA_interpretation("FOR - Potential Seedling Mortality",
method = "Weighted Average",
areasymbols = "CA630",
not_rated_value = 9999)
head(subset(res2, class_FORPotentialSeedlingMortality == "Not Rated"))
#> areasymbol musym muname mukey rating_FORPotentialSeedlingMortality
#> 30 CA630 1012 Mined Land 2403709 9999
#> 32 CA630 DAM Dams 2924912 9999
#> 44 CA630 W Water 2462630 9999
#> class_FORPotentialSeedlingMortality reason_FORPotentialSeedlingMortality
#> 30 Not Rated <NA>
#> 32 Not Rated <NA>
#> 44 Not Rated <NA> |
get_SDA_property(property = ...)
andget_SDA_interpretation(rulename = ...)
vectorization over property/rulename to work with any aggregation method. Now supports: Dominant Condition, Min/Max, Dominant Component, Weighted Averagemethod = "NONE"
(no aggregation)query_string
argument (default:FALSE
). Set asTRUE
to skip submitting query to SDA returning a string of the query that would have been sent instead of data.frame resultMUKEY
column name (and other keys) as lowercase in results (and subqueries)method
get_SDA_property
: RemoveISNULL(x, 0)
logic that affects weighted averages in presence of missing data / miscellaneous areas that have horizon recordsget_SDA_interpretation
: review handling of NULL values and calculation of weighted averagesnot_rated_value
with default value ofNA_real_
-- this creates consistent, user-definable behavior for not rated values across methods/queries. For backwards compatibility with original SQL usenot_rated_value = 99.0