Add ssurgoOnDemand SQL queries #179

brownag · 2021-04-05T16:50:54Z

Based on https://github.com/ncss-tech/ssurgoOnDemand by @jneme910 @cferguso, and inspired by @dylanbeaudette in #178

I defined the following methods provisionally that take areasymbol and mukey options for input (mirroring the areasymbol and "express" style for ssurgoOnDemand.

get_SDA_property
get_SDA_interpretation
get_SDA_muaggatt
get_SDA_hydric
get_SDA_pmgroupname

It appears they work . I want to do some verification against the Python results for some final testing.
I don't have plans to reinvent the wheel on these, but think they are excellent/highly useful in current form and wanted to have them readily accessible for comparisons etc.

@jneme910

…ssurgoOnDemand by @jneme910 @cferguso

brownag · 2021-04-05T18:32:28Z

Update on some quick comparisons against the ArcMap Toolbox output for areasymbol. Only apparent differences are NULL v.s. NA in the MUAGGATT output.

Here are parameters:

Code doing comparison:

library(sf)
#> Warning: package 'sf' was built under R version 4.0.4
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(soilDB)
library(daff)
#> Warning: package 'daff' was built under R version 4.0.3
ssas <- c("CA077","CA630","CA649")

dput(list.files("E:/Geodata/soils","TEST\\.gdb", full.names = TRUE))
#> c("E:/Geodata/soils/SOD_HYDRIC_TEST.gdb", "E:/Geodata/soils/SOD_INTERP_TEST.gdb", 
#> "E:/Geodata/soils/SOD_MUAGGATT_TEST.gdb", "E:/Geodata/soils/SOD_PM_DOM_COMP_TEST.gdb", 
#> "E:/Geodata/soils/SOD_PROP_MIN_KSAT_TEST.gdb")

# I put each output table in its own GDB so it is just a matter of calling read_sf
x <- read_sf("E:/Geodata/soils/SOD_HYDRIC_TEST.gdb")
y <- soilDB::get_SDA_hydric(areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x,y)
#> Daff Comparison: 'x' vs. 'y' 
#>      AREASYMBOL MUKEY MUSYM ...

x <- read_sf("E:/Geodata/soils/SOD_INTERP_TEST.gdb")
y <- soilDB::get_SDA_interpretation(rulename = "American Wine Grape Varieties Site Desirability (Medium)",
                                    method = "Dominant Component",
                                    areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x,y)
#> Daff Comparison: 'x' vs. 'y' 
#>      ... muname MUKEY rating ...


x <- as.data.frame(read_sf("E:/Geodata/soils/SOD_MUAGGATT_TEST.gdb"))
y <- as.data.frame(soilDB::get_SDA_muaggatt(areasymbols = ssas))
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y' 
#>   First 6 and last 6 patch lines:
#>     ... hydgrpdcd iccdcd   iccdcdpct niccdcd  niccdcdpct ... awmmfpwwta mukey  
#> ... ... ...       ...      ...       ...      ...        ... ...        ...    
#>     ... D         3        85        4        85         ... 1          461994 
#> ->  ... D         NULL->NA 100       4        85         ... 1          461995 
#>     ... D         6        85        6        85         ... 1          461996 
#> ... ... ...       ...      ...       ...      ...        ... ...        ...    
#>     ... D         3        85        3        85         ... 1          462052 
#> ... ... ...       ...      ...       ...      ...        ... ...        ...    
#>     ... D         4        85        4        85         ... 1          463303 
#> ->  ... D         NULL->NA 100       6        75         ... 1          463304 
#> ->  ... D         NULL->NA 100       7        85         ... 1          463306 
#> ->  ... <NA>      NULL->NA 100       NULL->NA 100        ... NA         463305 
#>     ... C         4        78        4        78         ... 1          2924987
#> ... ... ...       ...      ...       ...      ...        ... ...        ...

x <- read_sf("E:/Geodata/soils/SOD_PM_DOM_COMP_TEST.gdb")
y <- soilDB::get_SDA_pmgroupname(areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y' 
#>      areasymbol mukey musym muname compname comppct_r pmgroupname

x <- read_sf("E:/Geodata/soils/SOD_PROP_MIN_KSAT_TEST.gdb")
y <- soilDB::get_SDA_property("Saturated Hydraulic Conductivity - Rep Value", 
                              method = "Min/Max", 
                              FUN="MIN", 
                              areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y' 
#>      areasymbol musym muname ...

Here is the only detected difference:

brownag · 2021-04-05T18:43:15Z

NOTE: The above NULL verus NA discrfepancy is probably because of tibble / list columns allowing for NULL entries from read_sf. Writing the muaggatt table from GDB out as a CSV and reading in as regular data.frame results in a value identical to get_SDA_muaggatt

#...
write.csv(x, file="foo.csv")
x <- read.csv("foo.csv")
y <- as.data.frame(soilDB::get_SDA_muaggatt(areasymbols = ssas))
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: ‘x’ vs. ‘y’ 
#>   ---       ...                 
#> @@ X   musym ... awmmfpwwta mukey

cferguso · 2021-04-06T15:08:25Z

It probably isn't apparent in the queries so I wanted to let you know...
Rather then request the property or interpretation by individual areasymbols, the request is made by state. If the list of user requested areasymbols looked like 'CA001', 'CA003', 'OR001', 'OR003', 'WA001', 'WA003' the requests are made

areasymbol IN ('CA001', 'CA003')
areasymbol IN ('OR001', 'OR003')
areasymbol IN ('WA001', WA003')

rather than 6 individual queries for individual areasymbols. This represents an order of magnitude increase in speed. When I originally made this change CaC03 for individual areasymbols took 30 min. for CONUS whereas sending the requests by state took roughly 3 min. The concerns with this are SDA timeout or too many records. I tested on TX and CA as they are the states with the most map units and I never ran into an issue.

brownag · 2021-04-06T15:53:39Z

Great @cferguso thanks for giving some sidebars on those queries limits! It is good to know that you can query the biggest states for a property or interp! I did not test with whole-state/multi-state queries or determine "how much" I could cram into a single one.

I think if you only have 6 SSAs across states the fastest would be to put all 6 SSAs into a single IN statement for just one query like:

# results for 6 SSAs in a single query
system.time(y <- soilDB::get_SDA_property("Saturated Hydraulic Conductivity - Rep Value", 
                                          method = "Min/Max", 
                                          FUN="MIN", 
                                          areasymbols = c('CA630', 'CA077', 'OR644', 'OR003', 'WA001', 'CT600')))

but that won't scale to CONUS as you said. And also interps and state sets of interps are a different can of worms. At this point for the soilDB methods it is up to the user to do their own chunking of areasymbols, these lower-level get_* methods won't try to do that for them

I considered adding another input argument (in lieu of current areasymbol or mukeys) that would allow for using something more open ended like areasymbol LIKE 'CA%' but didn't go that route. In soilDB we have an easy way to get a state-wide vector of areasymbol using: get_legend_from_SDA("areasymbol LIKE 'CA%'")$areasymbol.

Iterating over state codes, getting areasymbols, and then running whole-state queries would be my suggestion to someone who was interested in running a batch of these queries on CONUS.

Add R ssurgoOnDemand utilities based on https://github.com/ncss-tech/…

61e3fe5

…ssurgoOnDemand by @jneme910 @cferguso

brownag mentioned this pull request Apr 5, 2021

"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

Closed

NEWS / DESCRIPTION + date

b7bd0f0

typo

44e83a3

brownag merged commit 5ca87e7 into master Apr 5, 2021

brownag deleted the SOD-R-bindings branch April 9, 2021 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ssurgoOnDemand SQL queries #179

Add ssurgoOnDemand SQL queries #179

brownag commented Apr 5, 2021 •

edited

Loading

brownag commented Apr 5, 2021

brownag commented Apr 5, 2021 •

edited

Loading

cferguso commented Apr 6, 2021

brownag commented Apr 6, 2021

Add ssurgoOnDemand SQL queries #179

Add ssurgoOnDemand SQL queries #179

Conversation

brownag commented Apr 5, 2021 • edited Loading

brownag commented Apr 5, 2021

brownag commented Apr 5, 2021 • edited Loading

cferguso commented Apr 6, 2021

brownag commented Apr 6, 2021

brownag commented Apr 5, 2021 •

edited

Loading

brownag commented Apr 5, 2021 •

edited

Loading