-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to format csv file for use as time series object with ts() command #2268
Comments
I had a quick look and see that bad data greatly outnumber good data, viz. library(oce)
f <- "~/Downloads/testFile.csv"
d <- read.csv(f, header = TRUE)
d$time <- as.POSIXct(d$TIME, format="%m/%d/%Y %H:%M", tz = "UTC")
oce.plot.ts(d$time, d$temp, type = "p")
badTime <- is.na(d$time)
table(badTime)
bad <- which(badTime)[1]
d[bad + seq(-10, 10), ] |
Here's another code, and the graph. I always start by examining the data in as raw a form as possible, before deciding how to reduce it further. Past what I have now, the rest is just using library(oce)
f <- "~/Downloads/testFile.csv"
d <- read.csv(f, header = TRUE)
d$time <- as.POSIXct(d$TIME, format="%m/%d/%Y %H:%M", tz = "UTC")
badTime <- is.na(d$time)
bad <- badTime | is.na(d$temp) # do this for all variables of interest
d <- d[!bad, ]
png("oce2268.png")
oce.plot.ts(d$time, d$temp, type = "o", pch = 20, cex = 0.5) |
Hi Bob, Thanks for submitting this here! Aside from having @dankelley to give advice, I generally find that it can be helpful to others to have a discussion that involves "advice" in an open forum like this rather than emails (which I can never find again). I agree with @dankelley that much of the data in that file appears to be empty rows for some reason, but also there are lots of "missing" data even within the series. The below plot shows the times as a function of index, where I used You mentioned that this is data from a slocum glider, which actually both @dankelley and I have quite a bit of experience working with (see our not-yet-on-CRAN package called Glider data is pretty complicated, so I think it would help to know what it is you're trying to do with it. For example, gliders usually collect data by diving, so that the "time series" data is in fact a sort of mix between time and depth profiles. Looking at your csv, I don't see any column for "depth" or "pressure", so it's hard to know what is the right way to bin the data to get something useful. I'd also point out that casting into a Just doing what I did in Issue #2264, gives something like the below. Note that there are some subleties here, because the NAs that get returned from library(oce)
d <- read.csv('testFile.csv', header=TRUE)
d$time <- as.POSIXct(d$TIME, format="%m/%d/%Y %H:%M", tz = "UTC")
badTime <- is.na(d$time)
bad <- badTime | is.na(d$temp) # do this for all variables of interest
dc <- d[!bad, ]
## bin average in monthly chunks
tbreaks <- seq(min(dc$time), max(dc$time), by="1 month")
Tb <- binMean1D(dc$time, dc$temp, tbreaks)
Tbsd <- binApply1D(dc$time, dc$temp, tbreaks, sd, na.rm=TRUE)
oce.plot.ts(dc$time, dc$temp)
lines(Tb$xmids, Tb$result, lwd=3, col=2)
# polygon(c(Tb$xmids, rev(Tb$xmids)), c(Tb$result + Tbsd$result, rev(Tb$result)),
# col=rgb(0, 0, 0, 0.25), border=NA)
# polygon(c(Tb$xmids, rev(Tb$xmids)), c(Tb$result - Tbsd$result, rev(Tb$result)),
# col=rgb(0, 0, 0, 0.25), border=NA)
errorbars(Tbsd$xmids, Tb$result, 0, Tbsd$result, style = 1, col = 2, lwd = 2) |
Hi Clark and Dan, This data example I sent is a snippet from a much larger 9-year dataset created from multiple glider missions from the OOI Pioneer Array New England region that have been merged and then sorted by time. The original depth-resolved glider data were processed to calculate euphotic zone depths and averages for properties over that depth. Each row of data in the file I sent you is the depth-averaged value over the euphotic zone for a single ascent or descent for the glider at that latitude/longitude, so, for example, the 'clh', column are the average values for chlorophyll a within the euphotic zone. (The euphotic zone depth column was arbitrarily removed from this dataset to save space.) The objective here is to develop a composite 2D picture of the bio-optical conditions across the PA site for monthly time steps, hence my need to obtain an average over a month. Averaging over a month will also create a dataset with even time intervals which then can be cast as a time series object and then decomposed to detect seasonal variability and long-term trends. The many missing lines of data are those glider profiles that failed the minimum requirements for accurate determination of the euphotic zone depth, for reasons such as intermittent cloud cover, excess wave focussing at the ocean's surface, or perhaps the glider profile was during the nighttime when there is no sunlight to assess euphotic zone. There are also gaps in the time series because of glider or instrument failures, so these data were not available from the glider DAC. The script Clark sent gives a solution for ridding the file of all these NAs. The question I have now is how do I take the monthly bin-average (red line from Clark’s plot) in his posting and turn this into a time series object for further analysis for decomposition to look for seasonality and trend? |
Thanks for the info. I had plotted the lon-lat data and see interesting sampling protocols. Maybe I'm missing something here. To get a time-series, use the |
Thanks Clark and Dan. That is very helpful. Using a glider 'swarm' is an experimental approach I'm taking to look at the bio-optics on the New England shelf. The profiler moorings would seem like the likelier candidate for a platform except that most of those profilers come only within ca. 25 m of surface whereas the gliders most of the time come up to within a few meters. Since sunlight attenuates exponentially with depth, much of the euphotic zone is going to be below the shallowest sampling depth if I used the profilers. The trouble with the gliders is with spatial biasing, but I'm trying to minimize this by averaging over large areas and time spans. I've attached my poster from OSM2024 where I presented these data for the 2021 glider year. Happy to talk to you more about it if you're interested. |
Quick Q on the poster: I'm a bit mixed up on the notation regarding MLD. Are you saying you find the depth where sigma-theta is 0.03 kg/m^3 higher than the surface value? I've always found it tricky to think of MLD (hence some discussion in my 2018 book) because any definition that seems good in one place/season seems to be not so good at another place/season. |
Yes, it is the depth at which the sigma-theta increases by 0.03 kg/m3 from the surface. I would rather know the turbulent depth (sensu Franks 2015), but given all we have is T and S on the ctds then then a hydrographic mixed layer depth is all we can get from the gliders. We also have buoyancy frequency for higher vertical resolution measure of density stratification. But the MLD does at least follow the expected changes with season, so that is mildly comforting.
~Bob
Robert D. Vaillancourt, Ph.D.
Professor of Oceanography
106 Brossman Hall
Millersville University
Millersville, PA 17551
From: Dan Kelley ***@***.***>
Sent: Sunday, December 8, 2024 10:33 AM
To: dankelley/oce ***@***.***>
Cc: Robert Vaillancourt ***@***.***>; Author ***@***.***>
Subject: Re: [dankelley/oce] How to format csv file for use as time series object with ts() command (Issue #2268)
CAUTION: This email originated from outside of Millersville. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Quick Q on the poster: I'm a bit mixed up on the notation regarding MLD. Are you saying you find the depth where sigma-theta is 0.03 kg/m^3 higher than the surface value? I've always found it tricky to think of MLD (hence some discussion in my 2018 book) because any definition that seems good in one place/season seems to be not so good at another place/season.
—
Reply to this email directly, view it on GitHub<#2268 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AS35AXCKONJUNQQCJ2X52U32ERRD3AVCNFSM6AAAAABTGPR6Y2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRWGE4DEOJRGE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Dear reporter, Do you think this issue (as defined by its title) has been addressed? If so, please close it. If not, please add a comment explaining what remains to be done. We use open issues as a sort of "to do" list for the project. Thanks! PS. This is a standardized reply. |
The original question seems to have been answered, and discussion finished about 3 weeks ago, so will close this issue now. Reminder: we normally ask that reporters close issues, but developers may do it if discussion has stalled on a matter that has apparently been resolved. |
Hi All.
I am trying to re-format my time-series csv file in order to be able to create a time series object using the ts() function. A small portion of my csv file is attached here and contains oceanographic data collected using a Webb glider. The data are collected at uneven time intervals and so must be wrangled into a new time series by windowing it. For this time series I would use a one-month time window. I have been trying to do this according to Dan Kelley's book section 5.9.4.3 on windowing methods, but the cut command does not work with datetime formatted data, and when I convert it to numeric the code no longer works correctly.
In a related note, is there an error in the example code in Dan Kelley's book? I believe the ceiling and floor commands in the following line of code should be switched. This is how it appears on page 154:
C <- cut(y, breaks =seq(ceiling(min(y)), floor(max(y)),5)
-Thanks,
Bob.
testFile.csv
The text was updated successfully, but these errors were encountered: