`get_SDA_interpretation()`: add subrule ratings to "reason" field #308

brownag · 2023-10-02T16:44:15Z

Adds subrule ratings to "reason" fields calculated for each mrulename in a get_SDA_interpretation() query.

This helps get key information about subrules that are exported in cointerp with ruledepth > 0

TODO:

consider if there is a truly generic way to flatten these results 1:1 with mukey/component/mrulename... or if that has to be left to the user/unique to the interpretation being queried
- could pack an optional XML column containing arbitrary complexity about subrules
~~.interpretation_weighted_average() needs SQLite compatible STRING_AGG() switch~~(wontfix; the rest of the query is not SQLite compatible either)
~~order subrule "reasons" alphabetically? or at least consistently~~ (wontfix; can't use ORDER BY in the T-SQL subquery)
some subrule reasons are hard to interpret without subrule name

Will close #303

Note the "reason" field now includes the interphr as well as interphrc values for rules with ruledepth != 0

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"))
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)

- includes many types of "not rated" - also includes a variety of other determinations that may be of general interest to interpretation users

brownag · 2023-10-27T00:08:51Z

Added subrule names, along with reason class and rating. Format is `{SUBRULE} "{REASON}" ({RATING}); {SUBRULE} "{REASON}" ({RATING});". It will be consistent, but a pain to parse if you really need those values. Also the ordering can be inconsistent.

Could potentially add an optional argument to get_SDA_interpretation() that post-processes all of the reason fields and "widens" the data.frame result accordingly, adding one column per subrule rating. Similar to example in #303

…ubrule rating values

brownag · 2023-10-27T00:58:56Z

Added argument to get_SDA_interpretation() called wide_reason, default FALSE. If TRUE, this new function does some post-processing. It parses the string contents of the "reason_*" fields from the result and adds a new column for each subrule rating within each main rule.

So, now you can quickly obtain ready-to-use subrule ratings for arbitrary interps, which should adequately cover needs from #303

library(soilDB)
x <- get_SDA_interpretation(rulename  = c("NCCPI - National Commodity Crop Productivity Index (Ver 3.0)", 
                                          "AGR - Pesticide Loss Potential-Leaching", 
                                          "ENG - Local Roads and Streets"),
                            method     = "Dominant Component", not_rated_value = "Not rated",
                            mukeys     = c("242963","242964","242965"), wide_reason = TRUE)
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)
#>   rating_AGRPesticideLossPotentialLeaching
#> 1                                Not rated
#> 2                                Not rated
#> 3                                Not rated
#>   class_AGRPesticideLossPotentialLeaching
#> 1                                    <NA>
#> 2                                    <NA>
#> 3                                    <NA>
#>   reason_AGRPesticideLossPotentialLeaching rating_ENGLocalRoadsandStreets
#> 1                                     <NA>                              1
#> 2                                     <NA>                              1
#> 3                                     <NA>                              1
#>   class_ENGLocalRoadsandStreets
#> 1                  Very limited
#> 2                  Very limited
#> 3                  Very limited
#>                                                                                                                                                                                                                                                                                      reason_ENGLocalRoadsandStreets
#> 1 Ponded > 4 hours "Ponding" (1); Wet, Ground Water Near the Surface (30 - 75cm) "Depth to saturated zone" (1); Potential Frost Action > Low "Frost action" (1); Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (1); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.37)
#> 2                                                                                                          Potential Frost Action > Low "Frost action" (1); Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (0.955); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.375)
#> 3                          Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (1); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.894); Wet, Ground Water Near the Surface (30 - 75cm) "Depth to saturated zone" (0.746); Potential Frost Action > Low "Frost action" (0.5)
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_Impactedsoil
#> 1                                                                           0
#> 2                                                                           0
#> 3                                                                           0
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICottonSubmodelII
#> 1                                                                                     0.001
#> 2                                                                                     0.001
#> 3                                                                                     0.001
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISmallGrainsSubmodelII
#> 1                                                                                          0.687
#> 2                                                                                          0.791
#> 3                                                                                          0.734
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISoybeansSubmodelI
#> 1                                                                                      0.752
#> 2                                                                                      0.777
#> 3                                                                                       0.76
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICornSubmodelI
#> 1                                                                                  0.826
#> 2                                                                                  0.917
#> 3                                                                                  0.899
#>   rating_reason_AGRPesticideLossPotentialLeaching_Notrated
#> 1                                                       NA
#> 2                                                       NA
#> 3                                                       NA
#>   rating_reason_ENGLocalRoadsandStreets_Ponded4hours
#> 1                                                  1
#> 2                                               <NA>
#> 3                                               <NA>
#>   rating_reason_ENGLocalRoadsandStreets_WetGroundWaterNeartheSurface3075cm
#> 1                                                                        1
#> 2                                                                     <NA>
#> 3                                                                    0.746
#>   rating_reason_ENGLocalRoadsandStreets_PotentialFrostActionLow
#> 1                                                             1
#> 2                                                             1
#> 3                                                           0.5
#>   rating_reason_ENGLocalRoadsandStreets_StrengthAASHTOGroupIndexWeightedAverage25100cm
#> 1                                                                                    1
#> 2                                                                                0.955
#> 3                                                                                    1
#>   rating_reason_ENGLocalRoadsandStreets_ShrinkSwellLEPWTDAVG25100cmorBedrock
#> 1                                                                       0.37
#> 2                                                                      0.375
#> 3                                                                      0.894

brownag · 2023-10-27T01:15:35Z

A final consideration: soilDB:::.cleanRuleColumnName() strips non-alphanumeric characters to make a "legal" R column name. It is possible this could lead to some collisions w/ certain subrule names...

For instance, inequalities are lost. "Ponded > 4 hours" and "Ponded < 4 hours" simplify to the same name "Ponded4hours". Could add a few things like replacing ">" "<" "=" with "GT" "LT" "EQ"...

It appears that collisions will be rare, and only for Texas subrules in FY24 SSURGO, but not impossible:

library(soilDB)
x <- SDA_query("SELECT DISTINCT rulename FROM cointerp")[[1]]
#> single result set, returning a data.frame
y <- soilDB:::.cleanRuleColumnName(x)

length(x)
#> [1] 3237
length(unique(y))
#> [1] 3231

xx <- c(x[duplicated(y)], x[duplicated(y, fromLast = TRUE)])
sort(xx)
#>  [1] "AGR - Rutting Hazard =< 10,000 Pounds per Wheel (TX)"        
#>  [2] "AGR - Rutting Hazard > 10,000 Pounds per Wheel (TX)"         
#>  [3] "CaCO3 < 40% by Wght. Av. 0-40\" (TX)"                        
#>  [4] "CaCO3 > 40% by Wght. Av. 0-40\" (TX)"                        
#>  [5] "Excess Humus (FB, Peat, HM/MPT Surface Layer) (TX)"          
#>  [6] "Excess Humus (FB/Peat/HM/MPT Surface Layer) (TX)"            
#>  [7] "Flooding Occasional or greater; Duration Long,Very Long (TX)"
#>  [8] "Flooding Occasional or greater; Duration Long/Very Long (TX)"
#>  [9] "Ponding => Frequent (TX)"                                    
#> [10] "Ponding Frequent (TX)"                                       
#> [11] "Soil Strength (Rutting Vehicle =< 10,000 Pounds) (TX)"       
#> [12] "Soil Strength (Rutting Vehicle > 10,000 Pounds) (TX)"

…lity information - Updates `.cleanRuleColumnName()` - note that this will add several characters to calculated column names for several `mrulename`, most of which are state-specific

brownag · 2023-10-27T01:29:57Z

There was only one existing collision in mrulename (as opposed to the few listed above for rulename). However, the modification to add inequalities back in will add a few characters to several existing mrulename which could be a small breaking change.

This is the list of affected interpretations that folks will need to update column name references for:

AGR - Rutting Hazard > 10,000 Pounds per Wheel (TX)
GRL - NV range seeding (Wind C >= 160) (NV)
GRL - Fencing, Post Depth =<24 inches
WLF - Food Plots for Upland Wildlife < 2 Acres (TX)
AGR - Rutting Hazard =< 10,000 Pounds per Wheel (TX)
GRL - Fencing, Post Depth =<36 inches
GRL - NV range seeding (Wind C = 10) (NV)
GRL - NV range seeding (Wind C = 30) (NV)
GRL - NV range seeding (Wind C = 20) (NV)
GRL - NV range seeding (Wind C = 100) (NV)
GRL - NV range seeding (Wind C = 40) (NV)
GRL - NV range seeding (Wind C = 60) (NV)
GRL - NV range seeding (Wind C = 80) (NV)
GRL - NV range seeding (Wind C = 50) (NV)

brownag added 2 commits October 2, 2023 10:04

update cointerp list

f6f429f

Add subrule ratings to "reason" field for #303

a619cb4

brownag force-pushed the fix303 branch from c3219c9 to a619cb4 Compare October 2, 2023 17:04

brownag added 3 commits October 2, 2023 10:11

fix SQL syntax issues with some aggregation methods

4f0531a

include subrule rating clases that start with "Not"

cc024c7

- includes many types of "not rated" - also includes a variety of other determinations that may be of general interest to interpretation users

add subrule name to reason

2b5a3f7

brownag marked this pull request as ready for review October 27, 2023 00:10

Add wide_reason argument for post-processing of "reason_*" column s…

63f9083

…ubrule rating values

get_SDA_interpretation: update simplified rule names to retain inequa…

780295f

…lity information - Updates `.cleanRuleColumnName()` - note that this will add several characters to calculated column names for several `mrulename`, most of which are state-specific

brownag merged commit 593f787 into master Oct 27, 2023
4 of 5 checks passed

brownag mentioned this pull request Oct 27, 2023

Spotty NCCPI data #303

Closed

brownag deleted the fix303 branch December 13, 2023 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`get_SDA_interpretation()`: add subrule ratings to "reason" field #308

`get_SDA_interpretation()`: add subrule ratings to "reason" field #308

brownag commented Oct 2, 2023 •

edited

Loading

brownag commented Oct 27, 2023

brownag commented Oct 27, 2023

brownag commented Oct 27, 2023

brownag commented Oct 27, 2023 •

edited

Loading

get_SDA_interpretation(): add subrule ratings to "reason" field #308

get_SDA_interpretation(): add subrule ratings to "reason" field #308

Conversation

brownag commented Oct 2, 2023 • edited Loading

brownag commented Oct 27, 2023

brownag commented Oct 27, 2023

brownag commented Oct 27, 2023

brownag commented Oct 27, 2023 • edited Loading

`get_SDA_interpretation()`: add subrule ratings to "reason" field #308

`get_SDA_interpretation()`: add subrule ratings to "reason" field #308

brownag commented Oct 2, 2023 •

edited

Loading

brownag commented Oct 27, 2023 •

edited

Loading