-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ssurgoOnDemand SQL queries #179
Conversation
Update on some quick comparisons against the ArcMap Toolbox output for areasymbol. Only apparent differences are NULL v.s. NA in the MUAGGATT output. Code doing comparison: library(sf)
#> Warning: package 'sf' was built under R version 4.0.4
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(soilDB)
library(daff)
#> Warning: package 'daff' was built under R version 4.0.3
ssas <- c("CA077","CA630","CA649")
dput(list.files("E:/Geodata/soils","TEST\\.gdb", full.names = TRUE))
#> c("E:/Geodata/soils/SOD_HYDRIC_TEST.gdb", "E:/Geodata/soils/SOD_INTERP_TEST.gdb",
#> "E:/Geodata/soils/SOD_MUAGGATT_TEST.gdb", "E:/Geodata/soils/SOD_PM_DOM_COMP_TEST.gdb",
#> "E:/Geodata/soils/SOD_PROP_MIN_KSAT_TEST.gdb")
# I put each output table in its own GDB so it is just a matter of calling read_sf
x <- read_sf("E:/Geodata/soils/SOD_HYDRIC_TEST.gdb")
y <- soilDB::get_SDA_hydric(areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x,y)
#> Daff Comparison: 'x' vs. 'y'
#> AREASYMBOL MUKEY MUSYM ...
x <- read_sf("E:/Geodata/soils/SOD_INTERP_TEST.gdb")
y <- soilDB::get_SDA_interpretation(rulename = "American Wine Grape Varieties Site Desirability (Medium)",
method = "Dominant Component",
areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x,y)
#> Daff Comparison: 'x' vs. 'y'
#> ... muname MUKEY rating ...
x <- as.data.frame(read_sf("E:/Geodata/soils/SOD_MUAGGATT_TEST.gdb"))
y <- as.data.frame(soilDB::get_SDA_muaggatt(areasymbols = ssas))
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y'
#> First 6 and last 6 patch lines:
#> ... hydgrpdcd iccdcd iccdcdpct niccdcd niccdcdpct ... awmmfpwwta mukey
#> ... ... ... ... ... ... ... ... ... ...
#> ... D 3 85 4 85 ... 1 461994
#> -> ... D NULL->NA 100 4 85 ... 1 461995
#> ... D 6 85 6 85 ... 1 461996
#> ... ... ... ... ... ... ... ... ... ...
#> ... D 3 85 3 85 ... 1 462052
#> ... ... ... ... ... ... ... ... ... ...
#> ... D 4 85 4 85 ... 1 463303
#> -> ... D NULL->NA 100 6 75 ... 1 463304
#> -> ... D NULL->NA 100 7 85 ... 1 463306
#> -> ... <NA> NULL->NA 100 NULL->NA 100 ... NA 463305
#> ... C 4 78 4 78 ... 1 2924987
#> ... ... ... ... ... ... ... ... ... ...
x <- read_sf("E:/Geodata/soils/SOD_PM_DOM_COMP_TEST.gdb")
y <- soilDB::get_SDA_pmgroupname(areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y'
#> areasymbol mukey musym muname compname comppct_r pmgroupname
x <- read_sf("E:/Geodata/soils/SOD_PROP_MIN_KSAT_TEST.gdb")
y <- soilDB::get_SDA_property("Saturated Hydraulic Conductivity - Rep Value",
method = "Min/Max",
FUN="MIN",
areasymbols = ssas)
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: 'x' vs. 'y'
#> areasymbol musym muname ... |
NOTE: The above NULL verus NA discrfepancy is probably because of tibble / list columns allowing for NULL entries from #...
write.csv(x, file="foo.csv")
x <- read.csv("foo.csv")
y <- as.data.frame(soilDB::get_SDA_muaggatt(areasymbols = ssas))
#> single result set, returning a data.frame
daff::diff_data(x, y)
#> Daff Comparison: ‘x’ vs. ‘y’
#> --- ...
#> @@ X musym ... awmmfpwwta mukey |
It probably isn't apparent in the queries so I wanted to let you know... areasymbol IN ('CA001', 'CA003') rather than 6 individual queries for individual areasymbols. This represents an order of magnitude increase in speed. When I originally made this change CaC03 for individual areasymbols took 30 min. for CONUS whereas sending the requests by state took roughly 3 min. The concerns with this are SDA timeout or too many records. I tested on TX and CA as they are the states with the most map units and I never ran into an issue. |
Great @cferguso thanks for giving some sidebars on those queries limits! It is good to know that you can query the biggest states for a property or interp! I did not test with whole-state/multi-state queries or determine "how much" I could cram into a single one. I think if you only have 6 SSAs across states the fastest would be to put all 6 SSAs into a single # results for 6 SSAs in a single query
system.time(y <- soilDB::get_SDA_property("Saturated Hydraulic Conductivity - Rep Value",
method = "Min/Max",
FUN="MIN",
areasymbols = c('CA630', 'CA077', 'OR644', 'OR003', 'WA001', 'CT600')))
but that won't scale to CONUS as you said. And also interps and state sets of interps are a different can of worms. At this point for the soilDB methods it is up to the user to do their own chunking of areasymbols, these lower-level I considered adding another input argument (in lieu of current Iterating over state codes, getting areasymbols, and then running whole-state queries would be my suggestion to someone who was interested in running a batch of these queries on CONUS. |
Based on https://github.com/ncss-tech/ssurgoOnDemand by @jneme910 @cferguso, and inspired by @dylanbeaudette in #178
I defined the following methods provisionally that take
areasymbol
andmukey
options for input (mirroring the areasymbol and "express" style for ssurgoOnDemand.It appears they work . I want to do some verification against the Python results for some final testing.
I don't have plans to reinvent the wheel on these, but think they are excellent/highly useful in current form and wanted to have them readily accessible for comparisons etc.