Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain cyclic slices fail when subsetting in 3.16.2 vs 3.16.1 #774

Closed
mattjbr123 opened this issue May 24, 2024 · 4 comments · Fixed by #782
Closed

Certain cyclic slices fail when subsetting in 3.16.2 vs 3.16.1 #774

mattjbr123 opened this issue May 24, 2024 · 4 comments · Fixed by #782
Labels
bug Something isn't working
Milestone

Comments

@mattjbr123
Copy link
Contributor

mattjbr123 commented May 24, 2024

Changes introduced in v3.16.2 in this commit as part of #760 has caused calls to the subset command to fail when the subset produces indices for (at least?) one cyclic axis in the form of a slice which doesn't satisfy the condition
step > 0 and start < 0 and stop > 0 (e.g. slice(-3, 0, 1)) [ CASE 1a ]
OR
step < 0 and start > 0 and stop < 0 (e.g. slice(0, 2, -1)) [ CASE 1b ]
or does satisfy the condition:
step > 0 and start < 0 and stop == -size (e.g. slice(-1,-3,1), size=3) [ CASE 2a ]
OR
step < 0 and stop < 0 and start == -size (e.g. slice(-3, -1, -1), size=3) [ CASE 2b ]

This arises from the following new code in cf.functions (the rest of the modifications to cf.functions in this commit are just refactors as far as I can tell):

    if not (
        (step > 0 and start < 0 and stop > 0)
        or (step < 0 and start > 0 and stop < 0)
    ):
        raise IndexError(f"{index!r} is not a cyclic slice")

in cf.functions.normalize_slice. This function determines whether or not a slice is cyclic, and these lines are designed to catch the cyclic cases after 'normalization' of the slice's start and stop indices by previous lines of the function. In cases 1a and 1b the normalization is skipped as the slices don't satisfy the conditions, in cases 2a and 2b, normalization is performed modifying the start or stop indices of the slice to 0. In both cases an IndexError is now raised by the function causing further modification of the slice and generation of the 'roll' variable in cf.functions.parse_indices to be skipped, whereas in v3.16.1 this modification would still take place, e.g. changing a slice of (-3, 0, 1) to (0, 3, 1) and generating roll = {:-3}
If the condition is modified to

    if not (
        (step > 0 and start < 0 and stop >**=** 0)
        or (step < 0 and start >**=** 0 and stop < 0)
    ):
        raise IndexError(f"{index!r} is not a cyclic slice")

the function correctly determines the slice to be cyclic and thus the slice is modified appropriately.
Without this modification, an object with a dimension of size 0 is returned when __getitem__ is called on a field's Data object when __getitem__ is called on a field as part of the cf.field.subset process, and thus the following error is raised in cf.field.__getitem__:

IndexError: Indices [array([0, 1, 2, 3]), slice(-3, 0, 1)] result in a subspaced shape of (4, 0), but can't create a subspace of Field that has a size 0 axis

I have run the same example in a cf v3.16.1 and v3.16.2 environment, with debugging enabled and some additional print statements thrown in for good-measure:

# Create example field
example = cf.Field()

axis_ = example.set_construct(cf.DomainAxis(4))
example.set_construct(
    cf.DimensionCoordinate(
        properties={
            "standard_name": 'latitude',
            "units": 'degrees_north',
            "axis": 'Y',
        },
        data=cf.Data([-67.5, -22.5, 22.5, 67.5]),
        bounds=cf.Bounds(data=cf.Data(
        np.array([[-90., -45.],
               [-45., 0.],
               [0., 45.],
               [45., 90.]]))),
    ),
    axes=axis_,
)

axis_ = example.set_construct(cf.DomainAxis(3))
example.set_construct(
    cf.DimensionCoordinate(
        properties={
            "standard_name": 'longitude',
            "units": 'degrees_east',
            "axis": 'X',
        },
        data=cf.Data([-120., 0., 120.]),
        bounds=cf.Bounds(data=cf.Data(
        np.array([[-180., -60.],
               [-60., 60.],
               [60., 180.]]))),
    ),
    axes=axis_,
)

example.set_data(cf.Data(np.zeros((4,3))), axes=('Y', 'X'))```


kwargs = {
    'longitude': cf.wi(-120, 120),
    'latitude': cf.wi(-67.5, 67.5),
}

Try with cf v3.16.2

example_subset = example.subspace(**kwargs)

field:
Field: 
-------
Data            : (latitude(4), longitude(3))
Dimension coords: latitude(4) = [-67.5, ..., 67.5] degrees_north
                : longitude(3) = [-120.0, 0.0, 120.0] degrees_east
config:
()
data_axes:
('domainaxis0', 'domainaxis1')
  parsed       = {('domainaxis1',): [(('domainaxis1',), 'dimensioncoordinate1', <CF DimensionCoordinate: longitude(3) degrees_east>, <CF Query: (wi [-120.0, 120.0])>, 'longitude')], ('domainaxis0',): [(('domainaxis0',), 'dimensioncoordinate0', <CF DimensionCoordinate: latitude(4) degrees_north>, <CF Query: (wi [-67.5, 67.5])>, 'latitude')]}
  unique_axes  = {'domainaxis1', 'domainaxis0'}
  n_axes       = 2
  item_axes    = ('domainaxis1',)
  keys         = ('dimensioncoordinate1',)
  1 1-d constructs: (<CF DimensionCoordinate: longitude(3) degrees_east>,)
  axis         = 'domainaxis1'
  value        = <CF Query: (wi [-120.0, 120.0])>
  identity     = 'longitude'
  1-d CASE 2:
    index      = slice(-3, 0, 1)
    ind        = None
  create_mask  = False
  item_axes    = ('domainaxis0',)
  keys         = ('dimensioncoordinate0',)
  1 1-d constructs: (<CF DimensionCoordinate: latitude(4) degrees_north>,)
  axis         = 'domainaxis0'
  value        = <CF Query: (wi [-67.5, 67.5])>
  identity     = 'latitude'
  1-d CASE 3:
    index      = [0 1 2 3]
    ind        = None
  create_mask  = False
  indices      = {'indices': {'domainaxis0': array([0, 1, 2, 3]), 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
domain_indices:
{'indices': {'domainaxis0': array([0, 1, 2, 3]), 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
axis_indices:
{'domainaxis0': array([0, 1, 2, 3]), 'domainaxis1': slice(-3, 0, 1)}
indices:
[array([0, 1, 2, 3]), slice(-3, 0, 1)]
indices function return:
(array([0, 1, 2, 3]), slice(-3, 0, 1))
Indices:
(array([0, 1, 2, 3]), slice(-3, 0, 1))
Field.__getitem__
    input indices = (array([0, 1, 2, 3]), slice(-3, 0, 1))
    shape    = (4, 3)
    indices  = [array([0, 1, 2, 3]), slice(-3, 0, 1)]
    indices2 = (array([0, 1, 2, 3]), slice(-3, 0, 1))
    findices = [array([0, 1, 2, 3]), slice(-3, 0, 1)]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_870/3059849535.py in ?()
----> 1 field_subset = directions.subspace(**kwargs)

~/miniconda3/envs/unifhytest/lib/python3.12/site-packages/cf/subspacefield.py in ?(self, *config, **kwargs)
    355         except (ValueError, IndexError) as error:
    356             if test:
    357                 return False
    358 
--> 359             raise error
    360         else:
    361             if test:
    362                 return True

~/miniconda3/envs/unifhytest/lib/python3.12/site-packages/cf/subspacefield.py in ?(self, *config, **kwargs)
    355         except (ValueError, IndexError) as error:
    356             if test:
    357                 return False
    358 
--> 359             raise error
    360         else:
    361             if test:
    362                 return True

~/miniconda3/envs/unifhytest/lib/python3.12/site-packages/cf/field.py in ?(self, indices)
    428 
    429         new_data = data[tuple(findices)]
    430 
    431         if 0 in new_data.shape:
--> 432             raise IndexError(
    433                 f"Indices {findices!r} result in a subspaced shape of "
    434                 f"{new_data.shape}, but can't create a subspace of "
    435                 f"{self.__class__.__name__} that has a size 0 axis"

IndexError: Indices [array([0, 1, 2, 3]), slice(-3, 0, 1)] result in a subspaced shape of (4, 0), but can't create a subspace of Field that has a size 0 axis

Try with cf v3.16.1.

Ignore the fact that the other axis in indices is now a dask array instead of a np.array or list, I don't think that is relevant.

example_subset = example.subspace(**kwargs)
field:
Field: 
-------
Data            : (latitude(4), longitude(3))
Dimension coords: latitude(4) = [-67.5, ..., 67.5] degrees_north
                : longitude(3) = [-120.0, 0.0, 120.0] degrees_east
config:
()
data_axes:
('domainaxis0', 'domainaxis1')
  parsed       = {('domainaxis1',): [(('domainaxis1',), 'dimensioncoordinate1', <CF DimensionCoordinate: longitude(3) degrees_east>, <CF Query: (wi [-120, 120])>, 'longitude')], ('domainaxis0',): [(('domainaxis0',), 'dimensioncoordinate0', <CF DimensionCoordinate: latitude(4) degrees_north>, <CF Query: (wi [-67.5, 67.5])>, 'latitude')]}
  unique_axes  = {'domainaxis0', 'domainaxis1'}
  n_axes       = 2
  item_axes    = ('domainaxis1',)
  keys         = ('dimensioncoordinate1',)
  1 1-d constructs: (<CF DimensionCoordinate: longitude(3) degrees_east>,)
  axis         = 'domainaxis1'
  value        = <CF Query: (wi [-120, 120])>
  identity     = 'longitude'
  1-d CASE 2:
    index      = slice(-3, 0, 1)
    ind        = None
  create_mask  = False
  item_axes    = ('domainaxis0',)
  keys         = ('dimensioncoordinate0',)
  1 1-d constructs: (<CF DimensionCoordinate: latitude(4) degrees_north>,)
  axis         = 'domainaxis0'
  value        = <CF Query: (wi [-67.5, 67.5])>
  identity     = 'latitude'
  1-d CASE 3:
    index      = dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>
    ind        = None
  create_mask  = False
  indices      = {'indices': {'domainaxis0': dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
domain_indices:
{'indices': {'domainaxis0': dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, 'domainaxis1': slice(-3, 0, 1)}, 'mask': {}}
axis_indices:
{'domainaxis0': dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, 'domainaxis1': slice(-3, 0, 1)}
indices:
[dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1)]
indices function return:
(dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
Indices:
(dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
Field.__getitem__
    input indices = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
    shape    = (4, 3)
    **##### NOTE CHANGED SLICE IN INDICES FROM HERE ON**
    indices  = [dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(0, 3, 1)]
    indices2 = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(-3, 0, 1))
    findices = [dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>, slice(0, 3, 1)]
Computing data shape: Performance may be degraded
    dice = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
DimensionCoordinate.__getitem__: shape    = (4,)
DimensionCoordinate.__getitem__: indices2 = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
DimensionCoordinate.__getitem__: indices  = [dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>]
DimensionCoordinate.__getitem__: findices = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
Computing data shape: Performance may be degraded
DimensionCoordinate.__getitem__: findices for bounds = (dask.array<and_, shape=(4,), dtype=bool, chunksize=(4,), chunktype=numpy.ndarray>,)
Computing data shape: Performance may be degraded
    dice = (slice(0, 3, 1),)
DimensionCoordinate.__getitem__: shape    = (3,)
DimensionCoordinate.__getitem__: indices2 = (slice(0, 3, 1),)
DimensionCoordinate.__getitem__: indices  = [slice(0, 3, 1)]
DimensionCoordinate.__getitem__: findices = (slice(0, 3, 1),)
DimensionCoordinate.__getitem__: findices for bounds = (slice(0, 3, 1),)

example_subset
<CF Field: (latitude(4), longitude(3))>

There may be other cases I've missed, but I thought this a suitable place to stop digging and ask for input!! I can see why the cases highlighted are not considered cyclic...
A cyclic variable x = [0,1,2] with a slice of (-3, 0, 1) should produce [0,1,2], which happens in v3.16.1 as expected but not in v3.16.2 as explained.

The outputs from the cf.environment calls are:

For v3.16.2:
Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.34
HDF5 library: 1.14.3
netcdf library: 4.9.2
udunits2 library: /home/users/mattjbr/miniconda3/envs/unifhytest/lib/libudunits2.so.0
esmpy/ESMF: 8.6.1
Python: 3.12.3
dask: 2024.4.2
netCDF4: 1.6.5
psutil: 5.9.8
packaging: 24.0
numpy: 1.26.4
scipy: 1.13.0
matplotlib: not available
cftime: 1.6.3
cfunits: 3.3.7
cfplot: not available
cfdm: 1.11.1.0
cf: 3.16.2

For v3.16.1:
Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.34
HDF5 library: 1.14.3
netcdf library: 4.9.2
udunits2 library: /home/users/mattjbr/miniconda3/envs/unifhytest2/lib/libudunits2.so.0
esmpy/ESMF: 8.6.1
Python: 3.12.3
dask: 2024.5.0
netCDF4: 1.6.5
psutil: 5.9.8
packaging: 24.0
numpy: 1.26.4
scipy: 1.13.0
matplotlib: not available
cftime: 1.6.3
cfunits: 3.3.7
cfplot: not available
cfdm: 1.11.1.0
cf: 3.16.1

@davidhassell
Copy link
Collaborator

Hi @mattjbr123 - thanks for the detailed report! I'll look into it ...
David

@davidhassell
Copy link
Collaborator

Hi Matt - I've had a good look, and agree with you. Your suggested changed change seems to make sense, and all of the current unit tests pass with it in-place. Would you like to submit a PR with the fix (including a change log entry and unit test in cf/test/test_functions.functionTest.test_normalize_slice)?

Thanks,
David

@mattjbr123
Copy link
Contributor Author

Excellent, thanks David, will do

mattjbr123 added a commit to mattjbr123/cf-python that referenced this issue Jun 7, 2024
mattjbr123 added a commit to mattjbr123/cf-python that referenced this issue Jun 7, 2024
@mattjbr123
Copy link
Contributor Author

Done - #782

@davidhassell davidhassell added this to the NEXT VERSION milestone Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants