Spotty NCCPI data #303

kevinwolz · 2023-09-22T17:45:34Z

I am struggling with the fact that the NCCPI data is very spotty in the database. For example:

Here I look up the most common mukey (Drummer) in Champaign County, IL, which is dominated by corn & soybeans:

soilDB::get_SDA_interpretation(rulename = RULES, 
                               method = "Dominant Component", 
                               mukeys = "242963") %>% 
  dplyr::rename(NCCPI      = rating_NCCPINationalCommodityCropProductivityIndexVer30,
                NCCPI.corn = rating_NCCPINCCPICornSubmodelI,
                NCCPI.soy  = rating_NCCPINCCPISoybeansSubmodelI,
                NCCPI.sg   = rating_NCCPINCCPISmallGrainsSubmodelII,
                NCCPI.cot  = rating_NCCPINCCPICottonSubmodelII) %>%
  dplyr::select(areasymbol, musym, mukey, cokey, compname, NCCPI, NCCPI.corn, NCCPI.soy, NCCPI.sg, NCCPI.cot)

The results provide the generic NCCPI rating, but all of the submodels come back as NA. I can understand why the cotton submodel might not be evaluated on this mukey, but the absence of the corn and soybean submodel doata surprised me.

If I generalize this call to instead search for the nationalmusym that includes that mukey, I get the same results but for many counties in Illinois that have this nationalmusym. However, I DO get one mukey in Indiana, and this one does have values for all submodels except cotton.

WHERE <- "nationalmusym IN ('2ssrz')"

soilDB::get_SDA_interpretation(rulename = RULES, 
                               method = "Dominant Component", 
                               WHERE = WHERE) %>% 
  dplyr::rename(NCCPI      = rating_NCCPINationalCommodityCropProductivityIndexVer30,
                NCCPI.corn = rating_NCCPINCCPICornSubmodelI,
                NCCPI.soy  = rating_NCCPINCCPISoybeansSubmodelI,
                NCCPI.sg   = rating_NCCPINCCPISmallGrainsSubmodelII,
                NCCPI.cot  = rating_NCCPINCCPICottonSubmodelII) %>%
  dplyr::select(areasymbol, musym, mukey, cokey, compname, NCCPI, NCCPI.corn, NCCPI.soy, NCCPI.sg, NCCPI.cot)

I've been told in the past that the presence/absence of these interpretation data is dependent on whether a particular state office "decided to run/export that interpretation". Is that true, or is there something else going on here?

Is there some status database somewhere that can show me which states have run/exported which interpretations?

I am trying to create a national map of NCCPI corn or soybean submodel values, is there any way that soilDB::get_SDA_interpretation can help me get around this?

Or perhaps my only choice would be build out these models from scratch and run them on raw SSURGO data myself? Maybe that's easier than trying to run the circuit and convince all states to export the data. Is there any soilDB support for something like this?

The text was updated successfully, but these errors were encountered:

dylanbeaudette · 2023-09-23T20:16:19Z

Hi Kevin,

Thanks for the detailed description of the situation. I believe that you are correct in assuming that the NCCPI submodel estimates are not "exported" for most states. I've asked our Leader for Soil Interpretations for additional information on the plans for NCCPI submodels on into the future. I could be that many of the regional interpretations will be adopted by additional states. I doubt that you will ever be able to create a national map of "NCCPI corn" since the models aren't generalized to the climates outside of the Midwest. I'll report back with my findings.

Otherwise, I think that you are attempting to access the data correctly.

brownag · 2023-09-23T20:44:24Z

I am struggling with the fact that the NCCPI data is very spotty in the database. For example:

Sorry to hear you are struggling, but I am glad to hear you are interested in our interpretations

The results provide the generic NCCPI rating, but all of the submodels come back as NA. ...
If I generalize this call to instead search for the nationalmusym that includes that mukey, I get the same results but for many counties in Illinois that have this nationalmusym. However, I DO get one mukey in Indiana, and this one does have values for all submodels except cotton.

I've been told in the past that the presence/absence of these interpretation data is dependent on whether a particular state office "decided to run/export that interpretation". Is that true, or is there something else going on here?

Yes, that is true. Only a handful of states (IN, OH, SC, CT, NJ, ND, RI) have the corn/soybean submodels exported at this time (September 2023). I am not sure if more states have opted to export the submodels for the upcoming October 1 refresh. NCCPI v3 (the "main model") is required national interpretation to be exported for all mapunits (see sample code below)

Is there some status database somewhere that can show me which states have run/exported which interpretations?

There is no status database other than what is in SSURGO. You can write query against cointerp table and see where the records exist for particular components within mapunits. While such a database does not currently exist, it would surely be a valuable SSURGO-derivative to create for all of our interpretations each year.

I am trying to create a national map of NCCPI corn or soybean submodel values, is there any way that soilDB::get_SDA_interpretation can help me get around this?

Or perhaps my only choice would be build out these models from scratch and run them on raw SSURGO data myself? Maybe that's easier than trying to run the circuit and convince all states to export the data. Is there any soilDB support for something like this?

get_SDA_interpretation() can only aggregate results that exist in the cointerp table (i.e. they have been exported by the states).

Running the interpretations on "SSURGO" would be difficult, or perhaps impossible, as the interpretation engine that creates the SSURGO export is using the NASIS data model which is slightly different. There have been some efforts to generalize the ability / data sources we use as input for interpretations, but so far nobody (to my knowledge) has built a working version of NCCPI outside of NASIS. It is theoretically possible but NCCPI is a very complex interpretation compared to some others we have in SSURGO.

It would be perhaps possible to create a custom NASIS export of the submodels for all mapunits in the US, but this would require the interpretations be run on the NASIS server, not the SSURGO product. Bob Dobos, National Leader for Interpretations, and main mind behind NCCPI would be the one to talk to about how to go about this.

Here is some sample code comparing NCCPI v3 v.s. submodels

library(soilDB)

RULES <- c("NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
           "NCCPI - NCCPI Corn Submodel (I)")

res <- soilDB::get_SDA_interpretation(rulename = RULES, method = "Dominant Component", 
                                      mukeys = c(242963, 164556))

# all nationalmusyms, excluding STATSGO
x0 <- SDA_query("SELECT COUNT(DISTINCT nationalmusym) FROM mapunit 
                INNER JOIN legend ON legend.lkey = mapunit.lkey AND legend.areasymbol != 'US'
                INNER JOIN component ON mapunit.mukey = component.mukey 
                AND majcompflag = 'Yes'")
#> single result set, returning a data.frame
x0
#>       V1
#> 1 239750

# all nationalmusyms, excluding STATSGO, have NCCPI v3.0
x <- SDA_query("SELECT COUNT(DISTINCT nationalmusym) FROM mapunit 
                INNER JOIN component ON mapunit.mukey = component.mukey AND majcompflag = 'Yes'
                INNER JOIN cointerp ON component.cokey = cointerp.cokey AND 
                           mrulename = 'NCCPI - National Commodity Crop Productivity Index (Ver 3.0)'")
#> single result set, returning a data.frame
x
#>       V1
#> 1 239750

# nationalmusyms with NCCPI Corn Submodel (for example)
y <- SDA_query("SELECT COUNT(DISTINCT nationalmusym) FROM mapunit 
                INNER JOIN component ON mapunit.mukey = component.mukey AND majcompflag = 'Yes'
                INNER JOIN cointerp ON component.cokey = cointerp.cokey AND 
                           mrulename = 'NCCPI - NCCPI Corn Submodel (I)'")
#> single result set, returning a data.frame
y
#>      V1
#> 1 20490

# 100% of nationalmusyms have NCCPI 
x$V1 / x0$V1
#> [1] 1

# and only 9% of nationalmusym have NCCPI corn submodel exported
y$V1 / x0$V1
#> [1] 0.08546403

# general function to see what states export a particular rule
get_states_using_cointerp <- function(rulename) {
  do.call('rbind', lapply(rulename, function(r){
    res <- SDA_query(sprintf("SELECT DISTINCT SUBSTRING(areasymbol, 0, 3) AS state FROM mapunit 
                  INNER JOIN legend ON legend.lkey = mapunit.lkey AND areasymbol != 'US'
                  INNER JOIN component ON mapunit.mukey = component.mukey AND majcompflag = 'Yes'
                  INNER JOIN cointerp ON component.cokey = cointerp.cokey AND mrulename = '%s'", r))
    res$mrulename <- r
    res
  }))
}

get_states_using_cointerp('NCCPI - NCCPI Corn Submodel (I)')
#> single result set, returning a data.frame
#>   state                       mrulename
#> 1    IN NCCPI - NCCPI Corn Submodel (I)
#> 2    OH NCCPI - NCCPI Corn Submodel (I)
#> 3    SC NCCPI - NCCPI Corn Submodel (I)
#> 4    CT NCCPI - NCCPI Corn Submodel (I)
#> 5    NJ NCCPI - NCCPI Corn Submodel (I)
#> 6    ND NCCPI - NCCPI Corn Submodel (I)
#> 7    RI NCCPI - NCCPI Corn Submodel (I)

get_states_using_cointerp('NCCPI - NCCPI Soybeans Submodel (I)')
#> single result set, returning a data.frame
#>   state                           mrulename
#> 1    IN NCCPI - NCCPI Soybeans Submodel (I)
#> 2    OH NCCPI - NCCPI Soybeans Submodel (I)
#> 3    SC NCCPI - NCCPI Soybeans Submodel (I)
#> 4    CT NCCPI - NCCPI Soybeans Submodel (I)
#> 5    NJ NCCPI - NCCPI Soybeans Submodel (I)
#> 6    ND NCCPI - NCCPI Soybeans Submodel (I)
#> 7    RI NCCPI - NCCPI Soybeans Submodel (I)

kevinwolz · 2023-09-30T07:38:58Z

Thank you all for the thorough responses! I'm disappointed that this awesome sub-models are not widely available, but I now understand what's going on.

It would be perhaps possible to create a custom NASIS export of the submodels for all mapunits in the US, but this would require the interpretations be run on the NASIS server, not the SSURGO product. Bob Dobos, National Leader for Interpretations, and main mind behind NCCPI would be the one to talk to about how to go about this.

I will reach out to Bob to see how hard this would be.

kevinwolz · 2023-10-02T14:16:30Z

Update: Bob says "Actually, the data for all the sub-models is already available of the soil data access facility. I will dust off the query and see if it still works."

Great news!

brownag · 2023-10-02T16:39:06Z

@kevinwolz

From my email:

I am thinking a bit more deeply on this now… and of course the NCCPI submodels data are part of the cointerp table! However they can’t be easily queried using the approach by mrulename that get_SDA_interpretation is currently using.
I mentioned that only a handful of states specifically export the submodels—now up to 9 after our FY24 refresh yesterday… however all the child ratings are exported along with the NCCPI for all nationalmusym
The get_SDA_interpretation() function returns ratings with a ruledepth of zero—this was built into the early queries that the function was based on.
The subrules and their ratings are available at greater depth. I will look into what is needed to generalize the function to specifically access the “deeper” rules, this would be very valuable addition to soilDB.

I looked into this. The latter point becomes a fairly difficult problem to fold in ruledepth > 0 into the various aggregation methods used for these functions.

I realized an alternate approach--which is something that I should have done a long time ago--and that is is to put the subrule ratings into the "reason" field (c3219c9). These values can then be re-extracted from the result that is 1:1 with mukey and mrulename.

This is still a bit messy but it does the job in terms of providing the component ratings alongside the main rule rating:

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"))

# some post-processing of the reason field to extract subrule ratings
x2 <- cbind(x, do.call('rbind', lapply(strsplit(
    x$reason_NCCPINationalCommodityCropProductivityIndexVer30,
    "; "), function(x) {
    x3 <- do.call('rbind', strsplit(gsub("([A-Za-z ]+) \\((.*)\\)", "\\1;\\2", x), ";"))
    x4 <- as.data.frame(as.list(x3[,2]))
    colnames(x4) <- paste0("rating_NCCPINationalCommodityCropProductivityIndexVer30_", soilDB:::.cleanRuleColumnName(x3[,1]))
    x4
  })))

x2
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                   reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 No limitation (0); Cotton (0.001); Small grains (0.687); Soybeans (0.752); Corn (0.826)
#> 2 Cotton (0.001); Soybeans (0.777); Small grains (0.791); Corn (0.917); No limitation (0)
#> 3  No limitation (0); Cotton (0.001); Small grains (0.734); Soybeans (0.76); Corn (0.899)
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Nolimitation
#> 1                                                                    0
#> 2                                                                    0
#> 3                                                                    0
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Cotton
#> 1                                                          0.001
#> 2                                                          0.001
#> 3                                                          0.001
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Smallgrains
#> 1                                                               0.687
#> 2                                                               0.791
#> 3                                                               0.734
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Soybeans
#> 1                                                            0.752
#> 2                                                            0.777
#> 3                                                             0.76
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Corn
#> 1                                                        0.826
#> 2                                                        0.917
#> 3                                                        0.899

kevinwolz · 2023-10-15T01:30:47Z

@brownag have you had any success with this?

brownag · 2023-10-27T17:14:18Z

@kevinwolz yes, and I refined the initial solution proposed above in #308 so it should be fairly painless to access any subrule ratings for interps.

Using the new argument wide_reason=TRUE a column with the rating value is added for each subrule within rule combination. For example:

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"),
                            wide_reason = TRUE)
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_Impactedsoil
#> 1                                                                           0
#> 2                                                                           0
#> 3                                                                           0
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICottonSubmodelII
#> 1                                                                                     0.001
#> 2                                                                                     0.001
#> 3                                                                                     0.001
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISmallGrainsSubmodelII
#> 1                                                                                          0.687
#> 2                                                                                          0.791
#> 3                                                                                          0.734
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISoybeansSubmodelI
#> 1                                                                                      0.752
#> 2                                                                                      0.777
#> 3                                                                                       0.76
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICornSubmodelI
#> 1                                                                                  0.826
#> 2                                                                                  0.917
#> 3                                                                                  0.899

brownag added a commit that referenced this issue Oct 2, 2023

Add subrule ratings to "reason" field for #303

c3219c9

brownag mentioned this issue Oct 2, 2023

get_SDA_interpretation(): add subrule ratings to "reason" field #308

Merged

4 tasks

brownag added a commit that referenced this issue Oct 2, 2023

Add subrule ratings to "reason" field for #303

a619cb4

brownag closed this as completed in #308 Oct 27, 2023

kevinwolz mentioned this issue Nov 17, 2023

get_SDA_interpretation sometimes throws warning for NCCPI #315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spotty NCCPI data #303

Spotty NCCPI data #303

kevinwolz commented Sep 22, 2023

dylanbeaudette commented Sep 23, 2023

brownag commented Sep 23, 2023

kevinwolz commented Sep 30, 2023

kevinwolz commented Oct 2, 2023

brownag commented Oct 2, 2023 •

edited

Loading

kevinwolz commented Oct 15, 2023

brownag commented Oct 27, 2023

Spotty NCCPI data #303

Spotty NCCPI data #303

Comments

kevinwolz commented Sep 22, 2023

dylanbeaudette commented Sep 23, 2023

brownag commented Sep 23, 2023

kevinwolz commented Sep 30, 2023

kevinwolz commented Oct 2, 2023

brownag commented Oct 2, 2023 • edited Loading

kevinwolz commented Oct 15, 2023

brownag commented Oct 27, 2023

brownag commented Oct 2, 2023 •

edited

Loading