Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ssurgoOnDemand SQL queries #179

Merged
merged 3 commits into from
Apr 5, 2021
Merged

Add ssurgoOnDemand SQL queries #179

merged 3 commits into from
Apr 5, 2021

Conversation

brownag
Copy link
Member

@brownag brownag commented Apr 5, 2021

Based on https://github.com/ncss-tech/ssurgoOnDemand by @jneme910 @cferguso, and inspired by @dylanbeaudette in #178

I defined the following methods provisionally that take areasymbol and mukey options for input (mirroring the areasymbol and "express" style for ssurgoOnDemand.

  • get_SDA_property
  • get_SDA_interpretation
  • get_SDA_muaggatt
  • get_SDA_hydric
  • get_SDA_pmgroupname

It appears they work . I want to do some verification against the Python results for some final testing.
I don't have plans to reinvent the wheel on these, but think they are excellent/highly useful in current form and wanted to have them readily accessible for comparisons etc.

@brownag
Copy link
Member Author

brownag commented Apr 5, 2021

Update on some quick comparisons against the ArcMap Toolbox output for areasymbol. Only apparent differences are NULL v.s. NA in the MUAGGATT output.

Here are parameters:
image

Code doing comparison:

library(sf)
#> Warning: package 'sf' was built under R version 4.0.4
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(soilDB)
library(daff)
#> Warning: package 'daff' was built under R version 4.0.3
ssas <- c("CA077","CA630","CA649")

dput(list.files("E:/Geodata/soils","TEST\\.gdb", full.names = TRUE))
#> c("E:/Geodata/soils/SOD_HYDRIC_TEST.gdb", "E:/Geodata/soils/SOD_INTERP_TEST.gdb", 
#> "E:/Geodata/soils/SOD_MUAGGATT_TEST.gdb", "E:/Geodata/soils/SOD_PM_DOM_COMP_TEST.gdb", 
#> "E:/Geodata/soils/SOD_PROP_MIN_KSAT_TEST.gdb")

# I put each output table in its own GDB so it is just a matter of calling read_sf
x <- read_sf("E:/Geodata/soils/SOD_HYDRIC_TEST.gdb")
y <- soilDB::get_SDA_hydric(areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x,y)
#> Daff Comparison: 'x' vs. 'y' 
#>      AREASYMBOL MUKEY MUSYM ...

x <- read_sf("E:/Geodata/soils/SOD_INTERP_TEST.gdb")
y <- soilDB::get_SDA_interpretation(rulename = "American Wine Grape Varieties Site Desirability (Medium)",
                                    method = "Dominant Component",
                                    areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x,y)
#> Daff Comparison: 'x' vs. 'y' 
#>      ... muname MUKEY rating ...


x <- as.data.frame(read_sf("E:/Geodata/soils/SOD_MUAGGATT_TEST.gdb"))
y <- as.data.frame(soilDB::get_SDA_muaggatt(areasymbols = ssas))
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y' 
#>   First 6 and last 6 patch lines:
#>     ... hydgrpdcd iccdcd   iccdcdpct niccdcd  niccdcdpct ... awmmfpwwta mukey  
#> ... ... ...       ...      ...       ...      ...        ... ...        ...    
#>     ... D         3        85        4        85         ... 1          461994 
#> ->  ... D         NULL->NA 100       4        85         ... 1          461995 
#>     ... D         6        85        6        85         ... 1          461996 
#> ... ... ...       ...      ...       ...      ...        ... ...        ...    
#>     ... D         3        85        3        85         ... 1          462052 
#> ... ... ...       ...      ...       ...      ...        ... ...        ...    
#>     ... D         4        85        4        85         ... 1          463303 
#> ->  ... D         NULL->NA 100       6        75         ... 1          463304 
#> ->  ... D         NULL->NA 100       7        85         ... 1          463306 
#> ->  ... <NA>      NULL->NA 100       NULL->NA 100        ... NA         463305 
#>     ... C         4        78        4        78         ... 1          2924987
#> ... ... ...       ...      ...       ...      ...        ... ...        ...

x <- read_sf("E:/Geodata/soils/SOD_PM_DOM_COMP_TEST.gdb")
y <- soilDB::get_SDA_pmgroupname(areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y' 
#>      areasymbol mukey musym muname compname comppct_r pmgroupname

x <- read_sf("E:/Geodata/soils/SOD_PROP_MIN_KSAT_TEST.gdb")
y <- soilDB::get_SDA_property("Saturated Hydraulic Conductivity - Rep Value", 
                              method = "Min/Max", 
                              FUN="MIN", 
                              areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y' 
#>      areasymbol musym muname ...

Here is the only detected difference:
image

@brownag
Copy link
Member Author

brownag commented Apr 5, 2021

NOTE: The above NULL verus NA discrfepancy is probably because of tibble / list columns allowing for NULL entries from read_sf. Writing the muaggatt table from GDB out as a CSV and reading in as regular data.frame results in a value identical to get_SDA_muaggatt

#...
write.csv(x, file="foo.csv")
x <- read.csv("foo.csv")
y <- as.data.frame(soilDB::get_SDA_muaggatt(areasymbols = ssas))
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: ‘x’ vs. ‘y’ 
#>   ---       ...                 
#> @@ X   musym ... awmmfpwwta mukey

@brownag brownag merged commit 5ca87e7 into master Apr 5, 2021
@cferguso
Copy link
Member

cferguso commented Apr 6, 2021

It probably isn't apparent in the queries so I wanted to let you know...
Rather then request the property or interpretation by individual areasymbols, the request is made by state. If the list of user requested areasymbols looked like 'CA001', 'CA003', 'OR001', 'OR003', 'WA001', 'WA003' the requests are made

areasymbol IN ('CA001', 'CA003')
areasymbol IN ('OR001', 'OR003')
areasymbol IN ('WA001', WA003')

rather than 6 individual queries for individual areasymbols. This represents an order of magnitude increase in speed. When I originally made this change CaC03 for individual areasymbols took 30 min. for CONUS whereas sending the requests by state took roughly 3 min. The concerns with this are SDA timeout or too many records. I tested on TX and CA as they are the states with the most map units and I never ran into an issue.

@brownag
Copy link
Member Author

brownag commented Apr 6, 2021

Great @cferguso thanks for giving some sidebars on those queries limits! It is good to know that you can query the biggest states for a property or interp! I did not test with whole-state/multi-state queries or determine "how much" I could cram into a single one.

I think if you only have 6 SSAs across states the fastest would be to put all 6 SSAs into a single IN statement for just one query like:

# results for 6 SSAs in a single query
system.time(y <- soilDB::get_SDA_property("Saturated Hydraulic Conductivity - Rep Value", 
                                          method = "Min/Max", 
                                          FUN="MIN", 
                                          areasymbols = c('CA630', 'CA077', 'OR644', 'OR003', 'WA001', 'CT600')))

but that won't scale to CONUS as you said. And also interps and state sets of interps are a different can of worms. At this point for the soilDB methods it is up to the user to do their own chunking of areasymbols, these lower-level get_* methods won't try to do that for them

I considered adding another input argument (in lieu of current areasymbol or mukeys) that would allow for using something more open ended like areasymbol LIKE 'CA%' but didn't go that route. In soilDB we have an easy way to get a state-wide vector of areasymbol using: get_legend_from_SDA("areasymbol LIKE 'CA%'")$areasymbol.

Iterating over state codes, getting areasymbols, and then running whole-state queries would be my suggestion to someone who was interested in running a batch of these queries on CONUS.

@brownag brownag deleted the SOD-R-bindings branch April 9, 2021 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants