"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

dylanbeaudette · 2021-03-31T17:17:20Z

Long-term plan: new macros or UDF to support aggregation on the fly in SQL and parity with WSS output. Contractors are working towards this, but no ETA.

Short-term plans:

SQL templates / string interpolation based on SSURGO on Demand / Soil Data Development Toolbox (python)
purely R-based solutions (slab, data.table, etc.) should match SOD / SDDT output when possible

See misc/soil-data-aggregation for stubs that can be built-upon for both styles. The (eventual) SDA-based solution will solve most problems, but R-based solutions will be required for data that hasn't been or cannot be pushed to SDA.

The text was updated successfully, but these errors were encountered:

brownag · 2021-03-31T17:19:57Z

Here is a small InterpretationEngine proof of concept involving WCS and custom data aggregation I made last week while testing the new R package.

https://github.com/ncss-tech/interpretation-engine/blob/master/misc/demo-SDA-WCS-IE-integration.R

EDIT: Images of output and 3D rendering of "Dunn1" SVI interp output

brownag · 2021-04-05T17:45:29Z

#179 adds the ssurgoOnDemand queries in get_* method style

I think we could probably get some more stub queries that target some other data categories from here
https://nasis.sc.egov.usda.gov/NasisReportsWebSite/limsreport.aspx?report_name=SDA-SQL_Library_Home

dylanbeaudette · 2021-06-02T22:37:35Z

Is there ever a valid reason to convert NULL values → 0?

Current SOD:

Changes made in #185 remove this step.

Related examples:

library(aqp)
library(soilDB)
library(sf)
library(rasterVis)
library(viridisLite)

# SJQ | Amador Co.

# Mokelumne | Sed. Rock Land
# https://casoilresource.lawr.ucdavis.edu/gmap/?loc=38.46643,-121.02565,z14
bb <- '-121.1099 38.4288,-121.1099 38.4942,-120.9728 38.4942,-120.9728 38.4288,-121.1099 38.4288'
wkt <- sprintf('POLYGON((%s))', bb)

x <- st_as_sfc(wkt)
st_crs(x) <- 4326

(mu <- mukey.wcs(aoi = x, db = 'gnatsgo'))

levelplot(mu, att = 'ID', margin = FALSE, colorkey = FALSE, col.regions = viridis)

# get unique mukeys from grid
ll <- levels(mu)[[1]]

s <- get_SDA_property(property = 'pH 1:1 water - Rep Value', method = 'Weighted Average', mukeys = ll$ID, top_depth = 0, bottom_depth = 25)

## with ISNULL(x, 0)
## !! what? 0s?
s[which(s$ph1to1h2o_r == 0), ]
z <- fetchSDA("mukey = '461915'")
z$ph1to1h2o_r

## dilution of weighted mean due to weights associated with NULL values
## document this
# another issue: weighted averages 
z <- fetchSDA("mukey = '461984'")
z$ph1to1h2o_r

brownag · 2023-12-19T23:58:44Z

I think this issue does not have any remaining action items to address and suggest we close it.

In my opinion UDFs are not a replacement for basic queries that operate on custom data sources using standard syntax.

While UDFs may provide an easy route to the "right" answers, they require documentation, maintenance, and testing of their own. I think the functionality offered would have to be very significant to warrant overhead of maintaining a separate library of these functions.

Much of my thought behind the work porting SOD/SDA queries, mostly avoiding pure R implementations in soilDB, was to provide a mapping of query string templates to specific categories of tabular results. Where a query relies on specific versions of UDFs, and the logic is not otherwise embedded in the query, it would be harder to port that query to other systems or for others to replicate.

The current approach in soilDB is a mix of the two "short-term" plans involving string templates and/or native R functions. I don't think that we need an issue in soilDB to discuss the merits of whether we should continue to develop functions like these. Since this issue was first raised, we have been addressing issues with WSS parity, bugs, etc in specific issues as they arise and I think we can now consider these to be in the "long term" plan for soilDB.

soilDB is a great place to prototype potential upgrades for SDA--for example the "MLRAPOLYGON" functionality recently added to fetchSDA_spatial() (#302) could be proposed as new table for SSURGO some day. In addition to current R/SQL query functions we could provide standard user-defined SQL functions that could be inserted into SQLite snapshot templates for in-database processing, and also new UDFs for SDA or similar web services... I think that is mostly outside scope of soilDB aside from perhaps a mechanism to list, version, or insert UDFs into databases.

In the future, converting to {dbplyr} for programmatic building of SQL could further enhance our ability to do aggregation and generalize to different database interfaces. While this is a nice idea it would of course be a significant change in terms of dependencies, would require refactoring most of the existing code, and would not work easily with the SDA web service (which is not a DBI source per se).

dylanbeaudette · 2023-12-20T17:21:03Z

Good points. I agree that this isn't really a soilDB issue and can be closed. I plan to purse the aggregation engine outside of SDA for now, as a long-term solution to this problem. SDA works well and does not need a bunch of new things bolted-on.

brownag · 2023-12-20T18:29:12Z

Great. I would love to see some of the work you have done with soil data aggregation in Postgres for SoilWeb/ISSR800 and the like made available in some reusable form.

dylanbeaudette mentioned this issue Mar 31, 2021

experimental WCS interface to gSSURGO, gNATSGO, ISSR-800 #148

Closed

brownag mentioned this issue Apr 5, 2021

Add ssurgoOnDemand SQL queries #179

Merged

brownag mentioned this issue Jun 2, 2021

Updates to get_SDA_* / SOD-like methods #185

Merged

1 task

brownag mentioned this issue Aug 6, 2021

Add get_SDA_coecoclass #203

Merged

brownag added this to the soilDB 3.0 milestone Jul 22, 2022

brownag closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

dylanbeaudette commented Mar 31, 2021 •

edited

Loading

brownag commented Mar 31, 2021 •

edited

Loading

brownag commented Apr 5, 2021

dylanbeaudette commented Jun 2, 2021

brownag commented Dec 19, 2023

dylanbeaudette commented Dec 20, 2023

brownag commented Dec 20, 2023

"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

Comments

dylanbeaudette commented Mar 31, 2021 • edited Loading

brownag commented Mar 31, 2021 • edited Loading

brownag commented Apr 5, 2021

dylanbeaudette commented Jun 2, 2021

brownag commented Dec 19, 2023

dylanbeaudette commented Dec 20, 2023

brownag commented Dec 20, 2023

dylanbeaudette commented Mar 31, 2021 •

edited

Loading

brownag commented Mar 31, 2021 •

edited

Loading