Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"soil data aggregation engine" vs. wrangling SQL statements or ad hoc R code #178

Closed
dylanbeaudette opened this issue Mar 31, 2021 · 6 comments
Milestone

Comments

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Mar 31, 2021

Long-term plan: new macros or UDF to support aggregation on the fly in SQL and parity with WSS output. Contractors are working towards this, but no ETA.

Short-term plans:

  • SQL templates / string interpolation based on SSURGO on Demand / Soil Data Development Toolbox (python)
  • purely R-based solutions (slab, data.table, etc.) should match SOD / SDDT output when possible

See misc/soil-data-aggregation for stubs that can be built-upon for both styles. The (eventual) SDA-based solution will solve most problems, but R-based solutions will be required for data that hasn't been or cannot be pushed to SDA.

@brownag
Copy link
Member

brownag commented Mar 31, 2021

Here is a small InterpretationEngine proof of concept involving WCS and custom data aggregation I made last week while testing the new R package.

https://github.com/ncss-tech/interpretation-engine/blob/master/misc/demo-SDA-WCS-IE-integration.R

EDIT: Images of output and 3D rendering of "Dunn1" SVI interp output
image
image

@brownag
Copy link
Member

brownag commented Apr 5, 2021

#179 adds the ssurgoOnDemand queries in get_* method style

I think we could probably get some more stub queries that target some other data categories from here
https://nasis.sc.egov.usda.gov/NasisReportsWebSite/limsreport.aspx?report_name=SDA-SQL_Library_Home

@dylanbeaudette
Copy link
Member Author

Is there ever a valid reason to convert NULL values → 0?

Current SOD:

Changes made in #185 remove this step.

Related examples:

library(aqp)
library(soilDB)
library(sf)
library(rasterVis)
library(viridisLite)

# SJQ | Amador Co.

# Mokelumne | Sed. Rock Land
# https://casoilresource.lawr.ucdavis.edu/gmap/?loc=38.46643,-121.02565,z14
bb <- '-121.1099 38.4288,-121.1099 38.4942,-120.9728 38.4942,-120.9728 38.4288,-121.1099 38.4288'
wkt <- sprintf('POLYGON((%s))', bb)

x <- st_as_sfc(wkt)
st_crs(x) <- 4326

(mu <- mukey.wcs(aoi = x, db = 'gnatsgo'))

levelplot(mu, att = 'ID', margin = FALSE, colorkey = FALSE, col.regions = viridis)

# get unique mukeys from grid
ll <- levels(mu)[[1]]

s <- get_SDA_property(property = 'pH 1:1 water - Rep Value', method = 'Weighted Average', mukeys = ll$ID, top_depth = 0, bottom_depth = 25)

## with ISNULL(x, 0)
## !! what? 0s?
s[which(s$ph1to1h2o_r == 0), ]
z <- fetchSDA("mukey = '461915'")
z$ph1to1h2o_r

## dilution of weighted mean due to weights associated with NULL values
## document this
# another issue: weighted averages 
z <- fetchSDA("mukey = '461984'")
z$ph1to1h2o_r

@brownag
Copy link
Member

brownag commented Dec 19, 2023

I think this issue does not have any remaining action items to address and suggest we close it.

In my opinion UDFs are not a replacement for basic queries that operate on custom data sources using standard syntax.

While UDFs may provide an easy route to the "right" answers, they require documentation, maintenance, and testing of their own. I think the functionality offered would have to be very significant to warrant overhead of maintaining a separate library of these functions.

Much of my thought behind the work porting SOD/SDA queries, mostly avoiding pure R implementations in soilDB, was to provide a mapping of query string templates to specific categories of tabular results. Where a query relies on specific versions of UDFs, and the logic is not otherwise embedded in the query, it would be harder to port that query to other systems or for others to replicate.

The current approach in soilDB is a mix of the two "short-term" plans involving string templates and/or native R functions. I don't think that we need an issue in soilDB to discuss the merits of whether we should continue to develop functions like these. Since this issue was first raised, we have been addressing issues with WSS parity, bugs, etc in specific issues as they arise and I think we can now consider these to be in the "long term" plan for soilDB.

soilDB is a great place to prototype potential upgrades for SDA--for example the "MLRAPOLYGON" functionality recently added to fetchSDA_spatial() (#302) could be proposed as new table for SSURGO some day. In addition to current R/SQL query functions we could provide standard user-defined SQL functions that could be inserted into SQLite snapshot templates for in-database processing, and also new UDFs for SDA or similar web services... I think that is mostly outside scope of soilDB aside from perhaps a mechanism to list, version, or insert UDFs into databases.

In the future, converting to {dbplyr} for programmatic building of SQL could further enhance our ability to do aggregation and generalize to different database interfaces. While this is a nice idea it would of course be a significant change in terms of dependencies, would require refactoring most of the existing code, and would not work easily with the SDA web service (which is not a DBI source per se).

@dylanbeaudette
Copy link
Member Author

Good points. I agree that this isn't really a soilDB issue and can be closed. I plan to purse the aggregation engine outside of SDA for now, as a long-term solution to this problem. SDA works well and does not need a bunch of new things bolted-on.

@brownag
Copy link
Member

brownag commented Dec 20, 2023

Great. I would love to see some of the work you have done with soil data aggregation in Postgres for SoilWeb/ISSR800 and the like made available in some reusable form.

@brownag brownag closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants