Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spotty NCCPI data #303

Closed
kevinwolz opened this issue Sep 22, 2023 · 7 comments · Fixed by #308
Closed

Spotty NCCPI data #303

kevinwolz opened this issue Sep 22, 2023 · 7 comments · Fixed by #308

Comments

@kevinwolz
Copy link

I am struggling with the fact that the NCCPI data is very spotty in the database. For example:

Here I look up the most common mukey (Drummer) in Champaign County, IL, which is dominated by corn & soybeans:

soilDB::get_SDA_interpretation(rulename = RULES, 
                               method = "Dominant Component", 
                               mukeys = "242963") %>% 
  dplyr::rename(NCCPI      = rating_NCCPINationalCommodityCropProductivityIndexVer30,
                NCCPI.corn = rating_NCCPINCCPICornSubmodelI,
                NCCPI.soy  = rating_NCCPINCCPISoybeansSubmodelI,
                NCCPI.sg   = rating_NCCPINCCPISmallGrainsSubmodelII,
                NCCPI.cot  = rating_NCCPINCCPICottonSubmodelII) %>%
  dplyr::select(areasymbol, musym, mukey, cokey, compname, NCCPI, NCCPI.corn, NCCPI.soy, NCCPI.sg, NCCPI.cot)

image

The results provide the generic NCCPI rating, but all of the submodels come back as NA. I can understand why the cotton submodel might not be evaluated on this mukey, but the absence of the corn and soybean submodel doata surprised me.

If I generalize this call to instead search for the nationalmusym that includes that mukey, I get the same results but for many counties in Illinois that have this nationalmusym. However, I DO get one mukey in Indiana, and this one does have values for all submodels except cotton.

WHERE <- "nationalmusym IN ('2ssrz')"

soilDB::get_SDA_interpretation(rulename = RULES, 
                               method = "Dominant Component", 
                               WHERE = WHERE) %>% 
  dplyr::rename(NCCPI      = rating_NCCPINationalCommodityCropProductivityIndexVer30,
                NCCPI.corn = rating_NCCPINCCPICornSubmodelI,
                NCCPI.soy  = rating_NCCPINCCPISoybeansSubmodelI,
                NCCPI.sg   = rating_NCCPINCCPISmallGrainsSubmodelII,
                NCCPI.cot  = rating_NCCPINCCPICottonSubmodelII) %>%
  dplyr::select(areasymbol, musym, mukey, cokey, compname, NCCPI, NCCPI.corn, NCCPI.soy, NCCPI.sg, NCCPI.cot)

image

I've been told in the past that the presence/absence of these interpretation data is dependent on whether a particular state office "decided to run/export that interpretation". Is that true, or is there something else going on here?

Is there some status database somewhere that can show me which states have run/exported which interpretations?

I am trying to create a national map of NCCPI corn or soybean submodel values, is there any way that soilDB::get_SDA_interpretation can help me get around this?

Or perhaps my only choice would be build out these models from scratch and run them on raw SSURGO data myself? Maybe that's easier than trying to run the circuit and convince all states to export the data. Is there any soilDB support for something like this?

@dylanbeaudette
Copy link
Member

Hi Kevin,

Thanks for the detailed description of the situation. I believe that you are correct in assuming that the NCCPI submodel estimates are not "exported" for most states. I've asked our Leader for Soil Interpretations for additional information on the plans for NCCPI submodels on into the future. I could be that many of the regional interpretations will be adopted by additional states. I doubt that you will ever be able to create a national map of "NCCPI corn" since the models aren't generalized to the climates outside of the Midwest. I'll report back with my findings.

Otherwise, I think that you are attempting to access the data correctly.

@brownag
Copy link
Member

brownag commented Sep 23, 2023

I am struggling with the fact that the NCCPI data is very spotty in the database. For example:

Sorry to hear you are struggling, but I am glad to hear you are interested in our interpretations

The results provide the generic NCCPI rating, but all of the submodels come back as NA. ...
If I generalize this call to instead search for the nationalmusym that includes that mukey, I get the same results but for many counties in Illinois that have this nationalmusym. However, I DO get one mukey in Indiana, and this one does have values for all submodels except cotton.

I've been told in the past that the presence/absence of these interpretation data is dependent on whether a particular state office "decided to run/export that interpretation". Is that true, or is there something else going on here?

Yes, that is true. Only a handful of states (IN, OH, SC, CT, NJ, ND, RI) have the corn/soybean submodels exported at this time (September 2023). I am not sure if more states have opted to export the submodels for the upcoming October 1 refresh. NCCPI v3 (the "main model") is required national interpretation to be exported for all mapunits (see sample code below)

Is there some status database somewhere that can show me which states have run/exported which interpretations?

There is no status database other than what is in SSURGO. You can write query against cointerp table and see where the records exist for particular components within mapunits. While such a database does not currently exist, it would surely be a valuable SSURGO-derivative to create for all of our interpretations each year.

I am trying to create a national map of NCCPI corn or soybean submodel values, is there any way that soilDB::get_SDA_interpretation can help me get around this?

Or perhaps my only choice would be build out these models from scratch and run them on raw SSURGO data myself? Maybe that's easier than trying to run the circuit and convince all states to export the data. Is there any soilDB support for something like this?

get_SDA_interpretation() can only aggregate results that exist in the cointerp table (i.e. they have been exported by the states).

Running the interpretations on "SSURGO" would be difficult, or perhaps impossible, as the interpretation engine that creates the SSURGO export is using the NASIS data model which is slightly different. There have been some efforts to generalize the ability / data sources we use as input for interpretations, but so far nobody (to my knowledge) has built a working version of NCCPI outside of NASIS. It is theoretically possible but NCCPI is a very complex interpretation compared to some others we have in SSURGO.

It would be perhaps possible to create a custom NASIS export of the submodels for all mapunits in the US, but this would require the interpretations be run on the NASIS server, not the SSURGO product. Bob Dobos, National Leader for Interpretations, and main mind behind NCCPI would be the one to talk to about how to go about this.

Here is some sample code comparing NCCPI v3 v.s. submodels

library(soilDB)

RULES <- c("NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
           "NCCPI - NCCPI Corn Submodel (I)")

res <- soilDB::get_SDA_interpretation(rulename = RULES, method = "Dominant Component", 
                                      mukeys = c(242963, 164556))

# all nationalmusyms, excluding STATSGO
x0 <- SDA_query("SELECT COUNT(DISTINCT nationalmusym) FROM mapunit 
                INNER JOIN legend ON legend.lkey = mapunit.lkey AND legend.areasymbol != 'US'
                INNER JOIN component ON mapunit.mukey = component.mukey 
                AND majcompflag = 'Yes'")
#> single result set, returning a data.frame
x0
#>       V1
#> 1 239750

# all nationalmusyms, excluding STATSGO, have NCCPI v3.0
x <- SDA_query("SELECT COUNT(DISTINCT nationalmusym) FROM mapunit 
                INNER JOIN component ON mapunit.mukey = component.mukey AND majcompflag = 'Yes'
                INNER JOIN cointerp ON component.cokey = cointerp.cokey AND 
                           mrulename = 'NCCPI - National Commodity Crop Productivity Index (Ver 3.0)'")
#> single result set, returning a data.frame
x
#>       V1
#> 1 239750

# nationalmusyms with NCCPI Corn Submodel (for example)
y <- SDA_query("SELECT COUNT(DISTINCT nationalmusym) FROM mapunit 
                INNER JOIN component ON mapunit.mukey = component.mukey AND majcompflag = 'Yes'
                INNER JOIN cointerp ON component.cokey = cointerp.cokey AND 
                           mrulename = 'NCCPI - NCCPI Corn Submodel (I)'")
#> single result set, returning a data.frame
y
#>      V1
#> 1 20490

# 100% of nationalmusyms have NCCPI 
x$V1 / x0$V1
#> [1] 1

# and only 9% of nationalmusym have NCCPI corn submodel exported
y$V1 / x0$V1
#> [1] 0.08546403

# general function to see what states export a particular rule
get_states_using_cointerp <- function(rulename) {
  do.call('rbind', lapply(rulename, function(r){
    res <- SDA_query(sprintf("SELECT DISTINCT SUBSTRING(areasymbol, 0, 3) AS state FROM mapunit 
                  INNER JOIN legend ON legend.lkey = mapunit.lkey AND areasymbol != 'US'
                  INNER JOIN component ON mapunit.mukey = component.mukey AND majcompflag = 'Yes'
                  INNER JOIN cointerp ON component.cokey = cointerp.cokey AND mrulename = '%s'", r))
    res$mrulename <- r
    res
  }))
}

get_states_using_cointerp('NCCPI - NCCPI Corn Submodel (I)')
#> single result set, returning a data.frame
#>   state                       mrulename
#> 1    IN NCCPI - NCCPI Corn Submodel (I)
#> 2    OH NCCPI - NCCPI Corn Submodel (I)
#> 3    SC NCCPI - NCCPI Corn Submodel (I)
#> 4    CT NCCPI - NCCPI Corn Submodel (I)
#> 5    NJ NCCPI - NCCPI Corn Submodel (I)
#> 6    ND NCCPI - NCCPI Corn Submodel (I)
#> 7    RI NCCPI - NCCPI Corn Submodel (I)

get_states_using_cointerp('NCCPI - NCCPI Soybeans Submodel (I)')
#> single result set, returning a data.frame
#>   state                           mrulename
#> 1    IN NCCPI - NCCPI Soybeans Submodel (I)
#> 2    OH NCCPI - NCCPI Soybeans Submodel (I)
#> 3    SC NCCPI - NCCPI Soybeans Submodel (I)
#> 4    CT NCCPI - NCCPI Soybeans Submodel (I)
#> 5    NJ NCCPI - NCCPI Soybeans Submodel (I)
#> 6    ND NCCPI - NCCPI Soybeans Submodel (I)
#> 7    RI NCCPI - NCCPI Soybeans Submodel (I)

@kevinwolz
Copy link
Author

Thank you all for the thorough responses! I'm disappointed that this awesome sub-models are not widely available, but I now understand what's going on.

It would be perhaps possible to create a custom NASIS export of the submodels for all mapunits in the US, but this would require the interpretations be run on the NASIS server, not the SSURGO product. Bob Dobos, National Leader for Interpretations, and main mind behind NCCPI would be the one to talk to about how to go about this.

I will reach out to Bob to see how hard this would be.

@kevinwolz
Copy link
Author

Update: Bob says "Actually, the data for all the sub-models is already available of the soil data access facility. I will dust off the query and see if it still works."

Great news!

@brownag
Copy link
Member

brownag commented Oct 2, 2023

@kevinwolz

From my email:

I am thinking a bit more deeply on this now… and of course the NCCPI submodels data are part of the cointerp table! However they can’t be easily queried using the approach by mrulename that get_SDA_interpretation is currently using.
I mentioned that only a handful of states specifically export the submodels—now up to 9 after our FY24 refresh yesterday… however all the child ratings are exported along with the NCCPI for all nationalmusym
The get_SDA_interpretation() function returns ratings with a ruledepth of zero—this was built into the early queries that the function was based on.
The subrules and their ratings are available at greater depth. I will look into what is needed to generalize the function to specifically access the “deeper” rules, this would be very valuable addition to soilDB.

I looked into this. The latter point becomes a fairly difficult problem to fold in ruledepth > 0 into the various aggregation methods used for these functions.

I realized an alternate approach--which is something that I should have done a long time ago--and that is is to put the subrule ratings into the "reason" field (c3219c9). These values can then be re-extracted from the result that is 1:1 with mukey and mrulename.

This is still a bit messy but it does the job in terms of providing the component ratings alongside the main rule rating:

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"))

# some post-processing of the reason field to extract subrule ratings
x2 <- cbind(x, do.call('rbind', lapply(strsplit(
    x$reason_NCCPINationalCommodityCropProductivityIndexVer30,
    "; "), function(x) {
    x3 <- do.call('rbind', strsplit(gsub("([A-Za-z ]+) \\((.*)\\)", "\\1;\\2", x), ";"))
    x4 <- as.data.frame(as.list(x3[,2]))
    colnames(x4) <- paste0("rating_NCCPINationalCommodityCropProductivityIndexVer30_", soilDB:::.cleanRuleColumnName(x3[,1]))
    x4
  })))

x2
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                   reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 No limitation (0); Cotton (0.001); Small grains (0.687); Soybeans (0.752); Corn (0.826)
#> 2 Cotton (0.001); Soybeans (0.777); Small grains (0.791); Corn (0.917); No limitation (0)
#> 3  No limitation (0); Cotton (0.001); Small grains (0.734); Soybeans (0.76); Corn (0.899)
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Nolimitation
#> 1                                                                    0
#> 2                                                                    0
#> 3                                                                    0
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Cotton
#> 1                                                          0.001
#> 2                                                          0.001
#> 3                                                          0.001
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Smallgrains
#> 1                                                               0.687
#> 2                                                               0.791
#> 3                                                               0.734
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Soybeans
#> 1                                                            0.752
#> 2                                                            0.777
#> 3                                                             0.76
#>   rating_NCCPINationalCommodityCropProductivityIndexVer30_Corn
#> 1                                                        0.826
#> 2                                                        0.917
#> 3                                                        0.899

@kevinwolz
Copy link
Author

@brownag have you had any success with this?

@brownag
Copy link
Member

brownag commented Oct 27, 2023

@kevinwolz yes, and I refined the initial solution proposed above in #308 so it should be fairly painless to access any subrule ratings for interps.

Using the new argument wide_reason=TRUE a column with the rating value is added for each subrule within rule combination. For example:

library(soilDB)

x <- get_SDA_interpretation(rulename  = "NCCPI - National Commodity Crop Productivity Index (Ver 3.0)",
                            method     = "Dominant Component",
                            mukeys     = c("242963","242964","242965"),
                            wide_reason = TRUE)
x
#>    mukey    cokey areasymbol musym
#> 1 242963 23671045      IL019  152A
#> 2 242964 23670915      IL019  134A
#> 3 242965 23671016      IL019  154A
#>                                           muname compname compkind comppct_r
#> 1 Drummer silty clay loam, 0 to 2 percent slopes  Drummer   Series        94
#> 2        Camden silt loam, 0 to 2 percent slopes   Camden   Series        92
#> 3      Flanagan silt loam, 0 to 2 percent slopes Flanagan   Series        95
#>   majcompflag rating_NCCPINationalCommodityCropProductivityIndexVer30
#> 1         Yes                                                   0.826
#> 2         Yes                                                   0.917
#> 3         Yes                                                   0.899
#>   class_NCCPINationalCommodityCropProductivityIndexVer30
#> 1                             High inherent productivity
#> 2                             High inherent productivity
#> 3                             High inherent productivity
#>                                                                                                                                                                                                           reason_NCCPINationalCommodityCropProductivityIndexVer30
#> 1 Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.687); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.752); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.826)
#> 2 NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.777); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.791); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.917); Impacted soil "No limitation" (0)
#> 3  Impacted soil "No limitation" (0); NCCPI - NCCPI Cotton Submodel (II) "Cotton" (0.001); NCCPI - NCCPI Small Grains Submodel (II) "Small grains" (0.734); NCCPI - NCCPI Soybeans Submodel (I) "Soybeans" (0.76); NCCPI - NCCPI Corn Submodel (I) "Corn" (0.899)
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_Impactedsoil
#> 1                                                                           0
#> 2                                                                           0
#> 3                                                                           0
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICottonSubmodelII
#> 1                                                                                     0.001
#> 2                                                                                     0.001
#> 3                                                                                     0.001
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISmallGrainsSubmodelII
#> 1                                                                                          0.687
#> 2                                                                                          0.791
#> 3                                                                                          0.734
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPISoybeansSubmodelI
#> 1                                                                                      0.752
#> 2                                                                                      0.777
#> 3                                                                                       0.76
#>   rating_reason_NCCPINationalCommodityCropProductivityIndexVer30_NCCPINCCPICornSubmodelI
#> 1                                                                                  0.826
#> 2                                                                                  0.917
#> 3                                                                                  0.899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants