-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Average time #211
Average time #211
Conversation
Only a question ... do we want to remove |
@cehbrecht @agstephens I think we said to remove it at some point, I can do that in a new PR if we are ready to remove it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ellesmith88,
Thanks for putting this PR together. There is one issue for us to resolve before we press ahead - and it's not a problem with your code, it might be a problem with xarray
and the decisions it makes.
@@ -217,3 +217,110 @@ def test_aux_variables(): | |||
) | |||
|
|||
assert "do_i_get_written" in result[0].variables | |||
|
|||
|
|||
def test_average_over_years(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellesmith88 this is all looking good and the tests seem to cover everything we'd need.
There is one issue that we might need to consider philosophically (about what xarray
is doing):
In the test test_average_over_years()
:
- the input data starts on 16/12/2005 and ends 16/12/2299.
- the output data starts on 01/01/2005 and ends on 01/01/2299
In CF terms, we can think of the new time array as being a series of time periods rather than time points (i.e. time point 1 represents the entire year of 2005). In which case, the expected output array would typically be defined with a time
axis and time_bounds
bounds array.
And typically the first time point would be:
time[0] = 2005-07-01 (i.e. halfway(ish) through the year)
time_bounds[0] = [2005-01-01, 2299-12-31]
However, that is not a definitive rule. I think the first thing to do is to test the values of the bounds. If the bounds come out correctly then we can probably just accept what xarray
is doing.
Please can you extend this test (and some of the others) to include a check for bounds values for time in a way that is logical.
Note that xarray
has all these different frequencies:
https://github.com/pydata/xarray/blob/main/xarray/coding/cftime_offsets.py#L605-L608
I'm not sure if the YearEnd
and YearBegin
options might be relevant to this issue in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agstephens having a look at the output arrays, it seems that there are no time bounds.
For example, from the test test_average_over_years()
, the output dataset is:
<xarray.Dataset>
Dimensions: (time: 295, lat: 2, bnds: 2, lon: 2)
Coordinates:
* time (time) object 2005-01-01 00:00:00 ... 2299-01-01 00:00:00
height float64 1.5
* lat (lat) float64 -90.0 35.0
* lon (lon) float64 0.0 187.5
Dimensions without coordinates: bnds
Data variables:
lat_bnds (time, lat, bnds) float64 dask.array<chunksize=(1, 2, 2), meta=np.ndarray>
lon_bnds (time, lon, bnds) float64 dask.array<chunksize=(1, 2, 2), meta=np.ndarray>
tas (time, lat, lon) float32 dask.array<chunksize=(1, 2, 2), meta=np.ndarray>
The frequencies I'm using for calculating the time average are:
month: 1MS
year: 1AS
and I think the 'S' indicates using the start of the year or start of the month for resampling. I think this is probably what we want it to do, but I might be confused!
Other options include one such as AS-JUN, but I think this means the average is calculated over June - May rather than over Jan - Dec.
There are options in xarray resample (https://xarray.pydata.org/en/stable/generated/xarray.Dataset.resample.html) to change the label for the output times, but only to the left or right of the interval, so this isn't very helpful.
I can't see anything else about changing the label/ indicating the time period each time point covers, but will have another look to see what other options there might be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ellesmith88, thanks for looking into this.
I wonder if they have looked into this within the CDS code. Please can you get in touch with James (ECMWF) to ask for his thoughts on this issue. I wonder if they have come across and tackled this problem - or whether they are just happy to accept what xarray
produces.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellesmith88 Just looking at your work so far on this. I would propose that we might want to re-use a create_time_bounds(...)
function in other code (in future). So please can you move out that functionality into a separate function. I think it might get used outside of clisops.core
and clisops.ops
so you could put them in a new module: clisops.utils.time_utils
.
Pull Request Checklist:
What kind of change does this PR introduce?:
Adds
average_time
inclisops.core
andclisops.ops
and tests.Does this PR introduce a breaking change?:
No