-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: More Sophisticated Bounds Handling in Temporal Averaging Operations #594
Comments
I opened up PR #601 to implement the documentation updates suggested in your alternative solution. |
Just referencing this comment, which provides further discussion of this issue. |
FYI – @tomvothecoder – this is a pentad dataset that this issue would effect:
|
@tomvothecoder – I put together a notebook on this – maybe we could schedule a time to go over it? Or should I post it? Or just start a PR? |
@pochedls You can start a new PR and commit it temporarily there, I'll analyze it, then we can go over it if needed. Thanks for starting work on this! |
@tomvothecoder - I've prototyped some new functionality in a new branch. It is not anywhere near complete – should I still do a PR (that way I could comment on it to help orient you)? FYI: this is the prototype example of producing a monthly average from a pentad dataset with correct weights: import xcdat as xc
fn = '/p/user_pub/climate_work/pochedley1/msu/netcdf/uah_6.1-pentad_tlt_197901-202412.nc'
ds = xc.open_dataset(fn)
ds = ds.spatial.average('tlt')
dsa = ds.temporal.compute_monthly_average('tlt') |
Yes, a PR would be helpful to get things rolling. Thanks! |
Is your feature request related to a problem?
xcdat temporal averaging operations currently bin data by the labelled time point with the weights derived from the difference in the time bounds. This works for most conventional climate data: a timepoint of
2020-01-16 12:00
with bounds of[2020-01-01 00:00, 2020-02-01 00:00]
would be given 31 days of weight in January (e.g., in creating an annual average or climatology), which is correct.There are reasonable instances where this wouldn't work. Imagine pentad data with a time point of
2020-02-02 12:00
with bounds of[2020-01-31 00:00, 2020-02-05 00:00]
. This time point should be given one day of weight in January and four days of weight in February. The current algorithm (e.g., for monthly averaging) assigns all five days of weight in February (the labelled time point).Describe the solution you'd like
Weights should be assigned based on the time period that they fall into. This would mean that a given time point can contribute to averages in more than one time interval
Describe alternatives you've considered
Solutions for the time being would be to update documentation to note that:
get_time_bounds
generally assumes data with frequencies of annual, monthly, daily, or sub-daily (I thought of this while writing this issue)Additional context
I'm not sure cdms / cdutil covers this case; it would be helpful to determine what cdutil does.
This seems like it could be challenging issue to address in general and might require a major refactor of the logic use for existing temporal averaging calculations.
The text was updated successfully, but these errors were encountered: