Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic parsing of dimension names leads to duplicated 'Ti' dimensions #468

Open
jamesafranke opened this issue Nov 7, 2024 · 10 comments · May be fixed by #479
Open

Automatic parsing of dimension names leads to duplicated 'Ti' dimensions #468

jamesafranke opened this issue Nov 7, 2024 · 10 comments · May be fixed by #479
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@jamesafranke
Copy link

Hi! Love the package!

using YAXArrays, Zarr 
ds = open_dataset( zopen( "gs://weatherbench2/datasets/keisler/2020-360x181.zarr") )

leads to:
Screen Shot 2024-11-07 at 2 20 56 PM

The original dataset has a time and a prediction_timedelta (forecast horizon). I cannot find where this is getting automatically parsed into Ti. Maybe it is within DimensionalData.jl somewhere. Any way to override this behavior?

Screen Shot 2024-11-07 at 2 23 06 PM
@jamesafranke jamesafranke changed the title Automatic parsing of dimension names leads to duplicated duplicated 'Ti' dimensions Automatic parsing of dimension names leads to duplicated 'Ti' dimensions Nov 7, 2024
@jamesafranke
Copy link
Author

jamesafranke commented Nov 7, 2024

Never mind. I was able to rename one of the TI axes for a single cube:

temp = renameaxis!(ds.temperature, :Ti=>:Ti2)

Screen Shot 2024-11-07 at 2 48 22 PM

Since two time axes is somewhat rare...probably fine to just do it this way. 😄

@Balinus
Copy link
Contributor

Balinus commented Nov 8, 2024

I think this is still a problem that the parsing gives the same name to 2 dimensions. I still believe that the renaming from "time" (standard dimension name in 99% of climate and weather datasets) to "Ti" is not a good practice for interconnexion with other packages that often hardcode "time" in their packages (sometimes using PyCall to use some Python libraries).

@lazarusA
Copy link
Collaborator

lazarusA commented Nov 8, 2024

see rafaqz/DimensionalData.jl#499.

as a side note I usually try to do dim{:time} in my workflows 😄 . Maybe we should do this in the yax side and don't fall back to Ti. ...

@jamesafranke
Copy link
Author

@lazarusA thanks!

@Balinus
Copy link
Contributor

Balinus commented Nov 11, 2024

Another example of this behaviour leading to sometimes strange results (or more spcifically, leading to a bad interpretation of what the data is).

Coordinates saved to disk:

Coordinates:
  * lat            (lat) float32 1kB 10.12 10.38 10.62 ... 73.88 74.12 74.38
  * lead           (lead) int64 32B 1 4 7 10
  * lon            (lon) float32 2kB -169.9 -169.6 -169.4 ... -50.38 -50.12
  * quantiles      (quantiles) float64 152B 0.05 0.1 0.15 0.2 ... 0.85 0.9 0.95
  * forecast_date  (forecast_date) datetime64[ns] 4kB 1981-01-15 ... 2024-03-15

Coordinates read by YAXArrays:

( quantiles Sampled{Float64} 0.05:0.05:0.95 ForwardOrdered Regular Points,
   lon       Sampled{Float32} -169.875f0:0.25f0:-50.125f0 ForwardOrdered Regular Points,
  ↗ lat       Sampled{Float32} 10.125f0:0.25f0:74.375f0 ForwardOrdered Regular Points,
  ⬔ lead      Sampled{Int64} 1:3:10 ForwardOrdered Regular Points,
  ◩ Ti        Sampled{DateTime} [1981-01-15T00:00:00, , 2024-03-15T00:00:00] ForwardOrdered Irregular Points)

Hence, forecast_date is inferred as the time dimension, which is OK as a first approximation. However, using Ti remove the meaning of a forecast_date (which could be not correlated to the meaning of a specific date).

@lazarusA
Copy link
Collaborator

ok, definitely a bug. Any pointers @felixcremer ? we need to update the logic for this.

@lazarusA lazarusA added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Nov 11, 2024
@meggart
Copy link
Member

meggart commented Nov 28, 2024

The logic is basically in this line:

DD.Ti(tsteps[offs+1:end])

This means, whenever something is encoded as a CF convention time dimension, it will get the label Ti.

We could change this line to

DD.rebuild(DD.name2dim(axname),tsteps[offs+1:end])

and then dimension names would remain untouched. I know in the beginning of our transition to DD we were doing this but then somehow ran into problems elsewhere but I can not remember what it was, maybe @felixcremer can remember. In the light of the issues brought up here I completely agree we should change something in the logic of when and if at all to parse things as Ti, X and Y.

@meggart
Copy link
Member

meggart commented Nov 28, 2024

Ideally I think the user should be able to decide if they want the re-labelling of the dimensions so that the arrays work better with plot recipes etc or not. Maybe this could even happen in an extra step. One suggestion would be that something like this would be possible:

ds = open_dataset(path) # returns lon, lat, time etc...
ds_labeled = fixspacetimedims(ds)

where the (to be written) fixspacetimedims function searches for occurences of lon or lat and time in all their spelling variants and replaces the dimension types (maybe even after user-specified rules) while making sure we only end up with a single X, Y and Ti.

@Balinus
Copy link
Contributor

Balinus commented Nov 28, 2024

I think having the option to relabel is a good suggestion. It would be nice to not lose the plot recipes and still keep the original dimensions names. Perhaps, within the open_dataset function to have a keyword arg, like keep_original_name logic?

Alternatively, is it possible to specify, at the plot recipe level, the X, Y, Ti dimensions to use?

Something like :

Makie.contourf(ds, X=:longitude, Y:latitude)

Not sure what is best or easiest.

@felixcremer
Copy link
Member

I am wondering whether we could keep the dim name but make it a subtype of Ti if we detect it as a CF time dimension

@lazarusA lazarusA linked a pull request Dec 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants