-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification of time stamps #297
Comments
Well, yes and no (no and yes?) -- the problem is that the tools are limited -- I'm not aware of any (commonly used) tools that do UTC with leap seconds [*]. Without the tools, you cannot correctly calculate a time delta that crosses leap-second boundaries. (note that there was a proposal not too long ago to define a UTC-pretending-leap-seconds-don't-exist calendar for CF which didn't pass.) Anyway, I agree that we could find some more clear language, but this is my take: We have gotten away with this for this long because the vast majority of use cases for CF don't require second-level precision -- so it just doesn't matter. For those applications that DO require seconds level precision -- I think the choices are:
A TAI calendar was proposed in cf-convention/vocabularies#62 -- but that was closed due to inactivity, and the fact that the issue was partly address by other changes -- though it still seems to me to be a gap -- particularly for folks that may need to store data that is already in TAI.
NOTE: I think it would be good to add a recommendation to use epochs close to your time as a general advise anyway -- and maybe something about data type. For example, the FVCOM model defaults to using single precision hours since 1970 -- which loses second precision before you get to 2020 :-( -- you'd think that would be an obvious
Which I think is the opposite of what it says now, which is:
[I don't see any reason for this restriction -- if it's a valid UTC time, why shouldn't it be representable?]
[again -- why not??]
[This one is important -- as a leap second does not exist in some (most) calendars, and much software would choke in it, excluding it would make sense]
[This seems the opposite of what we should want -- I'm trying to wrap my head around why we would choose, essentially, "undetermined" over "precise" ?] As a way to talk about (and I know that has been hashed out many times in the past, but for clarity right now:
So: it seem we could define:
And: Converting from the numerical values to a human-readable timestamp in a particular calendar is up to post-processing software that may or may not correctly capture leap seconds, etc. Which leads to: the actual. time coordinate should be considered monotonically continuous and correct. In practice: if someone uses software that converts from, e.g. a UTC timestamp to the "time-delta since epoch" incorrectly (or imprecisely, then my point (2) may not hold -- but I think that should be considered an error (imprecision) in the data, not an expected result. [* which confused me a bit -- yes, leap seconds are intractable, as we don't know when the might occur in the future, but we DO know when the occurred in the past, a a library that would raise a waring when used for future times would be quite doable, I would think .. but I digress] |
Thanks for the quick and quite extensive reply. Let me start with our use case: We are (mis)using CF to store sensor level 1 data (e.g. airborne spectrometer at sensor-radiance and navigation data e.g. lat, lon, roll, ...). To synchronize these data, microsecond accuracy is required. We have the same issue when comparing satellite or airborne observations with ground-based measurements, where a deviation of >10 seconds can be an issue. Currently we are using the GPS epoch (in UTC), which is fine if used consistently (at nanosecond resolution the first int64 overflow will occur in more than 200 years and is thus someone else's problem™). As you mentioned correctly, though, a reference time stamp closer to the data is often preferable, e.g. to reduce a microsecond counter from int64 to int32. However, this is where my problems start, because if I have to specify the new epoch in UTC, then I have to take the leap second difference between GPS epoch and the new epoch into account to be CF compliant. This increases complexity without adding any benefit (beyond CF compliance) in my opinion. As a solution I consequently support the introduction of a TAI calendar, which would allow me to specify all times in TAI and thus free me from ever having to think about leap seconds again (until someone requests a data product in UTC). This would also seamlessly integrate with use most date/time libraries (e.g. numpy.datetime64), because they mostly do not support leap seconds and are thus optimally suited to handle TAI. Out of curiosity: Why is the deviation from UTC such a controversial issue for the CF community, if the difference is (has been?) of negligible practical importance for the majority of users? |
+1 on a TAI calendar -- I have no use case, but
Good question -- IMHO the issue is that UTC, is well, "universal" . computers keep time in UTC, and thus instruments report it, so it's the default, unavoidable, familiar, and everyone (thinks) they know what it is. So any deviation from using UTC is scary :-) |
I support adding a TAI calendar to CF. @claashk presents a clear use-case where the distinction between UTC and TAI is relevant, and the simplest and best way to represent it would be with the standard epoch+offset form using a real-world calendar that doesn't have leap-seconds. The addition of the calendar is very straightforward, though it sounds like we'll need to do some work to update the language quoted in the first comment. My thought is that it would make sense to move it to a discussion of the UTC calendar specifically, since as far as I'm aware, that's the only calendar that has leap-seconds. (I will also say that personally, I think leap-seconds were a terrible idea that should never have been implemented, so anything that supports and increases awareness of alternatives like TAI that don't include leap-seconds is a benefit to the community.) |
Thanks for raising the issue, @claashk. Since you have a clear use-case for TAI, I too support its introduction to CF. We have discussed it before, but we didn't have a definite use-case. As you say, issue 148 was extremely long and eventually inconclusive. Out of curiosity, you ask, "Why is the deviation from UTC such a controversial issue for the CF community?" It's a reasonable question, but I fear that, if I try to answer it, for some reason that no-one can understand it will lead instantly to an immensely long debate in which we all get confused! 😄 However, it's worth trying, because we haven't revisited this since we made quite a lot of changes in version 1.9 to clarify this part of the document. I believe the key thing is that a number with a Of course, the encoding is the obvious and convenient one: it's the elapsed time since the reference time, in all cases except for leap-seconds in the real world. (Luckily, model calendars don't have leap-seconds.) In Sect 4.4 we say (simplified slightly for the sake of argument):
The example comes from the UDUNITS manual. With this
You questioned this bullet point from Section 4.4.1
That is listed as one of the consequences of leap seconds not being counted in any existing CF calendar. I see it's confusing without the context. We could clarify it e.g. as
The TAI calendar will not have that problem, because there are no leap seconds. The time coordinate will always equal the elapsed time, like in model calendars. I agree with @sethmcg that it will be straightward to add TAI. I think we would avoid some complications if we did not allow the TAI calendar to represent any dates before TAI began i.e. not "proleptic". Would that make sense? Is it 1st January 1958? |
I think prohibiting proleptic TAI makes sense, since we don't have any data from before 1958 whose accuracy is at the level that we'd need it (and I have a hard time imagining it will come into existence), and for lower-accuracy data proleptic-gregorian will suffice. With regard to the point about time coordinate not exactly equaling the actual interval length, I think that's actually not correct. I think it would be truer to say that the length of the interval associated with a reference and a datetime depends on which calendar you're using. For the UTC calendar (only), the length of the interval cannot be calculated correctly (at an accuracy level of seconds) without reference to a list of leap-seconds that have been inserted. So given the time coordinate It's not that the length of the interval is indeterminate. It's that if you're using a calendar that doesn't have a fixed relationship between interval length and time units, and you don't take that into account when calculating datetimes from time coordinates, you're wrong. And because UTC has leap-seconds, people doing that with UTC time coordinates are often wrong and don't know it. |
Dear @sethmcg I agree with both of these statements of yours:
However, I disagree with your summary, "With regard to the point about time coordinate not exactly equaling the actual interval length, I think that's actually not correct." This must be an example of the strange phenomenon that this subject causes, where we all misunderstand one another! Maybe in this case I was vague about "time coordinate". By that phrase, I meant the number alone i.e. the element of the time coordinate variable. Do you agree with this more detailed version:
Best wishes Jonathan |
@JonathanGregory - I think I don't agree, although it's possible that I'm missing a subtlety of your point. (And my apologies for the length of what follows.) In my mental representation, the elapsed duration has primacy. The time coordinate is exactly what the units string says it is: it tells you you how many units of time have elapsed between the reference and the coordinate value, and then the calendar tells you how to convert that into a datetime. Consequently, it's in some sense an error to use a time coordinate in units that do not have fixed length for the calendar. You can't say Likewise, we can't use units of months for most calendars, because the months have different lengths (although it would be okay for a 360-day calendar, where all months are exactly 30 days long). And following the same logic, it would be wrong to use any unit longer than Now, I say that it is in a sense an error, because there's another way to look at it: you can still meaningfully communicate time coordinates using a unit that has some variability, but your precision is limited by that variability. It is meaningful to say "years since 1900" when your calendar has leap years, but if you do, you can't use smaller units than a year. I can say that 100 years have elapsed between 1900 and 2000, but I can't say that 36525 days have elapsed in that interval - because I don't know the date to that level of precision. For all calendars except UTC, there's a constant relationship between days, hours, minutes, and seconds, so you can express a time accurately down to the second using any of those units. But if your calendar is UTC and you have a time coordinate in units longer than seconds, you only know the time to the nearest minute. (And arguably not even that well, given that 37 leap seconds have been inserted; really you only know it to the nearest deka-minute.) So it's not really wrong to use units that vary in length, it just implicitly limits the precision of your time coordinates. And that means that we should be truncating our reference dates to that level of precision. Now, having said all that, I realize that this viewpoint doesn't quite match what the CF standard currently says in section 4.4.1. I think that that's a defect in the standard, and we should rewrite it to say that time coordinates are exactly what their unit strings say they are, and that units + calendar may limit the precision of your time coordinates. Because I think that's how everybody uses them in practice, and we should adjust the standard to match in order to minimize confusion and error. I was going to say that probably nobody has been correctly recording times in UTC, but on a re-read, the way things are currently specified, I think the situation we have is that the So now I've argued myself around to saying that we don't need to define a TAI calendar after all, we just need to clarify in the spec that the |
Hello - nice to this getting aired again.
I recall from issue 148 (haven't checked - not enough time right!) that one of the use cases was that some satellite instrument times are recorded as correct timestamp strings ( |
Dear @sethmcg Thanks for writing out your views on this in such detail. That's helpful. We partly agree, and partly disagree, it seems. I agree with what you say about units which don't have fixed length. That's why Sect 4.4 recommends that I agree that day, hour and minute are also units of variable length, when referring to UTC. Most people would understand "exactly one day after 2300 on 30th June 1992" to mean "2300 on 1st July 1992". That's what UDUNITS thinks:
and also what Linux
Both of those softwares define a day with a fixed length of 24×60×60=86400 seconds, but actually in UTC there were 86401 seconds between those two dates. I believe that the overwhelming majority of existing CF-netCDF datasets which refer to events in the real world have nonetheless encoded date-times without taking account of leap seconds. The intention is that You write
Of course that is a legitimate and reasonable view, but it's not the CF convention for the
The time coordinate value is primarily an encoded date-time, not an elapsed duration. These two things are the same except when a leap second intervenes. The convention has some awkward consequences (listed in Sect 4.4.1), but I think it's the right choice because in practice this is what people and software assume that's what the convention means. Therefore I think we do need to define the TAI calendar in CF. In practice the CF If there is a use-case for it, we could also define a UTC calendar, which would encode UTC as it should be (as you advocate), taking account of leap seconds. In that calendar, the bullet-point list in Sect 4.4.1 would not apply. Best wishes Jonathan |
As we all know, these discussions can get VERY long -- so I suggest:
Maybe that should be moved to a discussion -- I'm working on starting one now. |
OK -- started a discussion here: |
@ChrisBarker-NOAA I'll argue this in a longer response, but I think that the TAI calendar already exists and is named "standard". If that's the case, do we want to have TAI as a second name for it (like the |
I'm no expert, but yes, that appears to be the case -- though it would be better to restrict TAI to post-1972 (or thereabouts) SO yes, this may be a clarification, rather than an addition. But I do think, one way or another, that "TAI" should be specifically referenced! |
now that I'm thinking about it -- the "standard" calendar has a LONG definition, so even if it's pretty much the same thing, adding something like this to the calendar list: tai Alternatively, we could tack on a shorter version of that to the "standard" calendar definition. I think it's time for a PR or new issue if someone want to get that going ... |
You've convinced me that TAI and standard aren't quite the same over in the new discussion topic (#304); I'll keep any discussion of TAI vs standard vs UTC over there. I think we should create an independent TAI definition that doesn't rely on the standard definition, and I agree that we're now at the point where we have sufficient consensus to start an issue for it. I don't think it needs to be much longer than what you've got above; we just add something like "Under TAI, minutes are always exactly 60 seconds long, hours are always exactly 60 minutes long, and days are always exactly 24 hours long. Months and years follow the proleptic-Gregorian calendar." And although it was established in 1972, is there any reason it couldn't be extended backwards following proleptic-Gregorian rules? |
It certainly could -- but does anyone do that, or need that? If so, it would be proleptic-tai, I suppose. I can't imagine there's a use-case -- my tendency is not to introduce something that isn't standard and no one has a use case, but it also wouldn't' hurt, |
Thanks, for asking your question, @claashk. Has the question been answered, as far as it can be for the moment? As you see, you have reignited the discussion, now carrying on in #304. 😄 Feel free to join in! It's likely that it will lead to a proposal to add the TAI calendar. I will add the FAQ label to this question, to remind us at some point to make sure that TAI is discussed in the FAQ. |
Thank you very much for the comprehensive and constructive discussion. The proposal of a tai calendar sounds good to me and would solve our problems regarding this issue. |
Thank you, @claashk. I will close this issue now, because we are discussing a definite proposal (in conventions issue 542) to include the TAI calendar, among other calendar issues. |
I am a little confused regarding the specification of times (mostly UTC vs TAI) in CF. Especially the following sentence is not clear to me after reading it multiple times (no pun intended):
It is important to realise that a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents.
In my understanding a CF time coordinate value
1
with unitsseconds since t_ref
, would represent the date/timet=1 second since t_ref
, where1 s
is the actual length of the interval andt_ref
is the reference date/time. Do I understand the standard correctly, when I assume that in this notationt - t_ref
is not guaranteed to equal one second?More specifically, the standard seems to suggest that reference time specifications are UTC by default. As UTC introduced a leap second after 2016-12-31, does CF guarantee, that
10 seconds since 2016-12-31 23:59:59
equals8 seconds since 2017-01-01 00:00:00
or is this not the case? This question is e.g. relevant when comparing times specified relative to different epochs (e.g. GNSS time vs Unix time stamps) with sub-second accuracy.I am aware of the lengthy discussion regarding CF-Issue cf-convention/vocabularies#62 but as a user I find the current wording hard to understand. It appears to me, that the current intentions are to specify
I am not sure, whether both requirements can be combined in a consistent fashion.
The text was updated successfully, but these errors were encountered: