diff --git a/docs/CONDUCT.html b/docs/CONDUCT.html index 7640460..5048159 100644 --- a/docs/CONDUCT.html +++ b/docs/CONDUCT.html @@ -55,7 +55,7 @@
diff --git a/docs/articles/dbhydroR.html b/docs/articles/dbhydroR.html index d7e4265..32b4ee0 100644 --- a/docs/articles/dbhydroR.html +++ b/docs/articles/dbhydroR.html @@ -29,7 +29,7 @@ @@ -73,7 +73,7 @@vignettes/dbhydroR.Rmd
dbhydroR.Rmd
install.packages("dbhydroR")
Otherwise, the dbhydroR
can be installed by running the following command in the R
console:
devtools::install_github("ropensci/dbhydroR")
Once installed, the package can be loaded using the following command:
-library(dbhydroR)
Once installed, the package can be loaded using the following command:
+Water quality data can be retrieved using the get_wq
function which takes four required arguments. The user must specify a station ID, a test name, and a date range. Station IDs can be located on the SFWMD Station Maps or the Google Earth kmz file. An abbreviated list of available test names can be found in the appendix to this document while a full listing can be found at the DBHYDRO metadata page. Dates must be specified in YYYY-MM-DD format (e.g. 2015-02-26). The following set of examples retrieve measurements between March 2011 and May 2012. They can be run from the R console by issuing the command:
example(get_wq)
One variable at one station
+get_wq(station_id = "FLAB08", date_min = "2011-03-01",
+ date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")
One variable at multiple stations
+get_wq(station_id = c("FLAB08","FLAB09"), date_min = "2011-03-01",
+ date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")
One variable at a wildcard station
+get_wq(station_id = c("FLAB0%"), date_min = "2011-03-01",
+ date_max = "2012-05-01", test_name = "CHLOROPHYLLA-SALINE")
Multiple variables at multiple stations
+get_wq(station_id = c("FLAB08","FLAB09"), date_min = "2011-03-01",
+ date_max = "2012-05-01", test_name = c("CHLOROPHYLLA-SALINE",
+ "SALINITY"))
\end{itemize}
By default, get_wq
returns a cleaned output. First, the cleaning function converts the raw output from native DBHYDRO long format (each piece of data on its own row) to wide format (each site x variable combination in its own column) using the reshape2 package (Wickham 2007). Next, the extra columns associated with QA flags, LIMS, and District receiving are removed. Finally, row entries associated with QA field blanks, which are used to check on potential sources of contamination, are removed. Setting the raw
flag to TRUE
will force to retain information on QA field blanks as well as the other QA fields. An example query that retains this information and the original long formatting is shown below.
raw_wq <- get_wq(station_id = "FLAB08", date_min = "2011-03-01",
- date_max = "2011-05-01", test_name = "CHLOROPHYLLA-SALINE",
- raw = TRUE)
raw_wq <- get_wq(station_id = "FLAB08", date_min = "2011-03-01",
+ date_max = "2011-05-01", test_name = "CHLOROPHYLLA-SALINE",
+ raw = TRUE)
This raw data can then be cleaned using the function:
-clean_wq(raw_wq)
Hydrologic time series data can be retrieved using the get_hydro
function. The first task to accomplish prior to running get_hydro
is to identify one or more dbkeys which correspond to unique site x variable time-series. This can be done before-hand using the get_dbkey
function, the SFWMD Station Maps or the DBHYDRO Browser. One useful strategy for finding desired dbkeys is to run the get_dbkey
function interactively using progressively narrower search terms. For example, suppose we are interested in daily average wind data at Joe Bay but we have no alphanumeric dbkey
. Initially we could run get_dbkey
with the detail.level
set to “summary”.
get_dbkey(stationid = "JBTS", category = "WEATHER", param = "WNDS",
- detail.level = "summary")
Our search returns two results but only one of them has a daily average (DA) measurement frequency. We can verify the remaining attributes of our likely dbkey by setting the freq
parameter to “DA” and the detail.level
parameter to “full”.
get_dbkey(stationid = "JBTS", category = "WEATHER", param = "WNDS",
- freq = "DA", detail.level = "full")
get_dbkey(stationid = "JBTS", category = "WEATHER", param = "WNDS",
+ freq = "DA", detail.level = "full")
This exact dbkey can only be returned reliably by specifying all of the get_dbkey
parameters applicable to the “WEATHER” category.
get_dbkey(stationid = "JBTS", category = "WEATHER", param = "WNDS",
- freq = "DA", stat = "MEAN", recorder = "CR10", agency = "WMD",
- detail.level = "dbkey")
get_dbkey(stationid = "JBTS", category = "WEATHER", param = "WNDS",
+ freq = "DA", stat = "MEAN", recorder = "CR10", agency = "WMD",
+ detail.level = "dbkey")
Now that we have our dbkey in hand, we can use is as input to get_hydro
. In addition to a dbkey, we must specify a date range. Dates must be entered in YYYY-MM-DD format (e.g. 2015-02-26).
get_hydro(dbkey = "15081",
- date_min = "2013-01-01", date_max = "2013-02-02")
Alternatively, we can specify a set of arguments in our call to get_hydro
that will be passed to get_dbkey
on-the-fly. Use caution when using this strategy as complex stationid/category/parameter combinations can easily cause errors or return unexpected results. It is good practice to pre-screen your parameter values using get_dbkey
.
get_hydro(date_min = "2013-01-01", date_max = "2013-02-02",
- stationid = "JBTS", category = "WEATHER", param = "WNDS",
- freq = "DA", stat = "MEAN", recorder = "CR10", agency = "WMD")
get_hydro(date_min = "2013-01-01", date_max = "2013-02-02",
+ stationid = "JBTS", category = "WEATHER", param = "WNDS",
+ freq = "DA", stat = "MEAN", recorder = "CR10", agency = "WMD")
The contents of multiple data streams can be returned by specifying multiple dbkeys or entering on-the-fly get_dbkey
queries that return multiple dbkeys.
get_hydro(dbkey = c("15081", "15069"), date_min = "2013-01-01",
- date_max = "2013-02-02")
get_hydro(date_min = "2013-01-01", date_max = "2013-02-02",
- category = "WEATHER", stationid = c("JBTS", "MBTS"),
- param = "WNDS", freq = "DA", stat = "MEAN")
get_hydro(date_min = "2013-01-01", date_max = "2013-02-02",
+ category = "WEATHER", stationid = c("JBTS", "MBTS"),
+ param = "WNDS", freq = "DA", stat = "MEAN")
More get_hydro
examples including queries of other category
values (“SW”, “GW”, and “WQ”) can be viewed by issuing the following commands from the R
console:
example(get_dbkey)
-example(get_hydro)
By default, get_hydro
returns a cleaned output. First, the cleaning function clean_hydro
converts the raw output from native DBHYDRO long format (each piece of data on its own row) to wide format (each site x variable combination in its own column) using the reshape2 package (Wickham 2007). Next, some extra columns are removed that are associated with measurement location (longitude/latitude), frequency, and QA flags are removed. Setting the raw
flag to TRUE
will force get_hydro
to retain the original formatting and metadata fields. An example query that retains this information and the original long formatting is shown below.
raw_data <- get_hydro(date_min = "2013-01-01", date_max = "2013-02-02",
- stationid = "JBTS", category = "WEATHER", param = "WNDS",
- freq = "DA", stat = "MEAN", recorder = "CR10", agency = "WMD", raw = TRUE)
-
-clean_hydro(raw_data)
Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.
+Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.
dbhydroR
provides scripted access to the South Florida Water Management District’s DBHYDRO database which holds over 35 million hydrologic and water quality records from the Florida Everglades and surrounding areas.
install.packages("devtools") # Requires RTools if using Windows
NEWS.md
- get_hydro
causing data parsing failure in all cases (#16)get_dbkey
was incorrectly processing data headersget_hydro()
now resolves multiple matching of on-the-fly dbkeys to the one with the longest period of record.getwq()
, gethydro()
, and getdbkey()
are now deprecated in favor of get_wq()
, get_hydro()
, get_dbkey()
.getdbkey()
is no longer limited to < 100 resultsgethydro()
to guess missing column names of instantaneous dataget_dbkey(category, stationid = NA, param = NA, freq = NA, - longest = FALSE, stat = NA, recorder = NA, agency = NA, strata = NA, - detail.level = "summary", ...)+ longest = FALSE, stat = NA, recorder = NA, agency = NA, + strata = NA, detail.level = "summary", ...)