Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_SDA_interpretation(): add subrule ratings to "reason" field #308

Merged
merged 7 commits into from
Oct 27, 2023

Conversation

brownag
Copy link
Member

@brownag brownag commented Oct 2, 2023

Adds subrule ratings to "reason" fields calculated for each mrulename in a get_SDA_interpretation() query.

This helps get key information about subrules that are exported in cointerp with ruledepth > 0

TODO:

  • consider if there is a truly generic way to flatten these results 1:1 with mukey/component/mrulename... or if that has to be left to the user/unique to the interpretation being queried
    • could pack an optional XML column containing arbitrary complexity about subrules
  • .interpretation_weighted_average() needs SQLite compatible STRING_AGG() switch(wontfix; the rest of the query is not SQLite compatible either)
  • order subrule "reasons" alphabetically? or at least consistently (wontfix; can't use ORDER BY in the T-SQL subquery)
  • some subrule reasons are hard to interpret without subrule name

Will close #303

Note the "reason" field now includes the interphr as well as interphrc values for rules with ruledepth != 0

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"))
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)

 - includes many types of "not rated"
 - also includes a variety of other determinations that may be of general interest to interpretation users
@brownag
Copy link
Member Author

brownag commented Oct 27, 2023

Added subrule names, along with reason class and rating. Format is `{SUBRULE} "{REASON}" ({RATING}); {SUBRULE} "{REASON}" ({RATING});". It will be consistent, but a pain to parse if you really need those values. Also the ordering can be inconsistent.

Could potentially add an optional argument to get_SDA_interpretation() that post-processes all of the reason fields and "widens" the data.frame result accordingly, adding one column per subrule rating. Similar to example in #303

@brownag brownag marked this pull request as ready for review October 27, 2023 00:10
@brownag
Copy link
Member Author

brownag commented Oct 27, 2023

Added argument to get_SDA_interpretation() called wide_reason, default FALSE. If TRUE, this new function does some post-processing. It parses the string contents of the "reason_*" fields from the result and adds a new column for each subrule rating within each main rule.

So, now you can quickly obtain ready-to-use subrule ratings for arbitrary interps, which should adequately cover needs from #303

library(soilDB)
x <- get_SDA_interpretation(rulename  = c("NCCPI - National Commodity Crop Productivity Index (Ver 3.0)", 
                                          "AGR - Pesticide Loss Potential-Leaching", 
                                          "ENG - Local Roads and Streets"),
                            method     = "Dominant Component", not_rated_value = "Not rated",
                            mukeys     = c("242963","242964","242965"), wide_reason = TRUE)
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)
#>   rating_AGRPesticideLossPotentialLeaching
#> 1                                Not rated
#> 2                                Not rated
#> 3                                Not rated
#>   class_AGRPesticideLossPotentialLeaching
#> 1                                    <NA>
#> 2                                    <NA>
#> 3                                    <NA>
#>   reason_AGRPesticideLossPotentialLeaching rating_ENGLocalRoadsandStreets
#> 1                                     <NA>                              1
#> 2                                     <NA>                              1
#> 3                                     <NA>                              1
#>   class_ENGLocalRoadsandStreets
#> 1                  Very limited
#> 2                  Very limited
#> 3                  Very limited
#>                                                                                                                                                                                                                                                                                      reason_ENGLocalRoadsandStreets
#> 1 Ponded > 4 hours "Ponding" (1); Wet, Ground Water Near the Surface (30 - 75cm) "Depth to saturated zone" (1); Potential Frost Action > Low "Frost action" (1); Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (1); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.37)
#> 2                                                                                                          Potential Frost Action > Low "Frost action" (1); Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (0.955); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.375)
#> 3                          Strength (AASHTO Group Index Weighted Average (25-100cm)) "Low strength" (1); Shrink-Swell (LEP WTD_AVG 25-100cm or Bedrock) "Shrink-swell" (0.894); Wet, Ground Water Near the Surface (30 - 75cm) "Depth to saturated zone" (0.746); Potential Frost Action > Low "Frost action" (0.5)
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_Impactedsoil
#> 1                                                                           0
#> 2                                                                           0
#> 3                                                                           0
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICottonSubmodelII
#> 1                                                                                     0.001
#> 2                                                                                     0.001
#> 3                                                                                     0.001
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISmallGrainsSubmodelII
#> 1                                                                                          0.687
#> 2                                                                                          0.791
#> 3                                                                                          0.734
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISoybeansSubmodelI
#> 1                                                                                      0.752
#> 2                                                                                      0.777
#> 3                                                                                       0.76
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICornSubmodelI
#> 1                                                                                  0.826
#> 2                                                                                  0.917
#> 3                                                                                  0.899
#>   rating_reason_AGRPesticideLossPotentialLeaching_Notrated
#> 1                                                       NA
#> 2                                                       NA
#> 3                                                       NA
#>   rating_reason_ENGLocalRoadsandStreets_Ponded4hours
#> 1                                                  1
#> 2                                               <NA>
#> 3                                               <NA>
#>   rating_reason_ENGLocalRoadsandStreets_WetGroundWaterNeartheSurface3075cm
#> 1                                                                        1
#> 2                                                                     <NA>
#> 3                                                                    0.746
#>   rating_reason_ENGLocalRoadsandStreets_PotentialFrostActionLow
#> 1                                                             1
#> 2                                                             1
#> 3                                                           0.5
#>   rating_reason_ENGLocalRoadsandStreets_StrengthAASHTOGroupIndexWeightedAverage25100cm
#> 1                                                                                    1
#> 2                                                                                0.955
#> 3                                                                                    1
#>   rating_reason_ENGLocalRoadsandStreets_ShrinkSwellLEPWTDAVG25100cmorBedrock
#> 1                                                                       0.37
#> 2                                                                      0.375
#> 3                                                                      0.894

@brownag
Copy link
Member Author

brownag commented Oct 27, 2023

A final consideration: soilDB:::.cleanRuleColumnName() strips non-alphanumeric characters to make a "legal" R column name. It is possible this could lead to some collisions w/ certain subrule names...

For instance, inequalities are lost. "Ponded > 4 hours" and "Ponded < 4 hours" simplify to the same name "Ponded4hours". Could add a few things like replacing ">" "<" "=" with "GT" "LT" "EQ"...

It appears that collisions will be rare, and only for Texas subrules in FY24 SSURGO, but not impossible:

library(soilDB)
x <- SDA_query("SELECT DISTINCT rulename FROM cointerp")[[1]]
#> single result set, returning a data.frame
y <- soilDB:::.cleanRuleColumnName(x)

length(x)
#> [1] 3237
length(unique(y))
#> [1] 3231

xx <- c(x[duplicated(y)], x[duplicated(y, fromLast = TRUE)])
sort(xx)
#>  [1] "AGR - Rutting Hazard =< 10,000 Pounds per Wheel (TX)"        
#>  [2] "AGR - Rutting Hazard > 10,000 Pounds per Wheel (TX)"         
#>  [3] "CaCO3 < 40% by Wght. Av. 0-40\" (TX)"                        
#>  [4] "CaCO3 > 40% by Wght. Av. 0-40\" (TX)"                        
#>  [5] "Excess Humus (FB, Peat, HM/MPT Surface Layer) (TX)"          
#>  [6] "Excess Humus (FB/Peat/HM/MPT Surface Layer) (TX)"            
#>  [7] "Flooding Occasional or greater; Duration Long,Very Long (TX)"
#>  [8] "Flooding Occasional or greater; Duration Long/Very Long (TX)"
#>  [9] "Ponding => Frequent (TX)"                                    
#> [10] "Ponding Frequent (TX)"                                       
#> [11] "Soil Strength (Rutting Vehicle =< 10,000 Pounds) (TX)"       
#> [12] "Soil Strength (Rutting Vehicle > 10,000 Pounds) (TX)"

…lity information

 - Updates `.cleanRuleColumnName()`
 - note that this will add several characters to calculated column names for several `mrulename`, most of which are state-specific
@brownag
Copy link
Member Author

brownag commented Oct 27, 2023

There was only one existing collision in mrulename (as opposed to the few listed above for rulename). However, the modification to add inequalities back in will add a few characters to several existing mrulename which could be a small breaking change.

This is the list of affected interpretations that folks will need to update column name references for:

  • AGR - Rutting Hazard > 10,000 Pounds per Wheel (TX)
  • GRL - NV range seeding (Wind C >= 160) (NV)
  • GRL - Fencing, Post Depth =<24 inches
  • WLF - Food Plots for Upland Wildlife < 2 Acres (TX)
  • AGR - Rutting Hazard =< 10,000 Pounds per Wheel (TX)
  • GRL - Fencing, Post Depth =<36 inches
  • GRL - NV range seeding (Wind C = 10) (NV)
  • GRL - NV range seeding (Wind C = 30) (NV)
  • GRL - NV range seeding (Wind C = 20) (NV)
  • GRL - NV range seeding (Wind C = 100) (NV)
  • GRL - NV range seeding (Wind C = 40) (NV)
  • GRL - NV range seeding (Wind C = 60) (NV)
  • GRL - NV range seeding (Wind C = 80) (NV)
  • GRL - NV range seeding (Wind C = 50) (NV)

@brownag brownag merged commit 593f787 into master Oct 27, 2023
4 of 5 checks passed
@brownag brownag mentioned this pull request Oct 27, 2023
@brownag brownag deleted the fix303 branch December 13, 2023 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spotty NCCPI data
1 participant