Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result with cf.wi and cf.cellwi #30

Closed
bnlawrence opened this issue Oct 25, 2024 · 3 comments
Closed

Unexpected result with cf.wi and cf.cellwi #30

bnlawrence opened this issue Oct 25, 2024 · 3 comments
Assignees
Labels
bug Something isn't working cf/cfa related

Comments

@bnlawrence
Copy link
Owner

I am trying to subset a manifest file using the bounds that are stored in the database.

The particular test that is manifesting the issue will fail with the pytest -s -x tests/realistic in the within branch.

The code which is causing the error can be found in cfs/db/cfa_tools.py

def get_quark_field(instance, start_date, end_date):
    """ Given a CFS manifest field (i.e. an instance of the Manifest class 
    found in the django models.py) and a pair of bounding dates, return
    a CF field instance with the information needed to construct a 
    subspace manifest.
    : instance : A CFS manifest instance
    : start_date : A date in a cf.Data instance
    : end_date : A date in a cf.Data instance.
    : output : A CF field instance with the correct bounds, and 
    a set of fragment ids in the field data.
    """

    fld = cf.Field(properties={'long_name':'fragments'})
    
    fragments = np.array([f.id for f in instance.fragments.files.all()])
    T_axis= fld.set_construct(cf.DomainAxis(fragments.size))
    fld.set_data(fragments, axes=T_axis)

    binary_stream = io.BytesIO(instance.bounds)
    binary_stream.seek(0)
    bounds = np.load(binary_stream)

    timedata = np.mean(bounds, axis=1)
    
    dimT = cf.DimensionCoordinate(properties={'standard_name':'time',
                    'units':cf.Units(instance.units,calendar=instance.calendar)},
                data = timedata,
                bounds = cf.Bounds(data=bounds)
                )
    bounds_range(dimT.bounds)
    fld.set_construct(dimT, axes=T_axis)
    nf, nt = len(fragments), len(timedata)
    if nf != nt:
        raise RuntimeError(
            'Number of of manifest fragments ({nf}) not equal to time data length ({nt}).')
    print('Subspacing using cellwi to ',start_date, end_date)
    quark = fld.subspace(time=cf.cellwi(start_date, end_date))
    dimT = quark.dimension_coordinate('T')
    bounds_range(dimT.bounds)
    return quark

This code, when run in the test yields this output (before failing the assertion in the error below):

... 
Bounds range  [[1950-01-01 06:00:00, 1950-02-01 00:00:00]] 360_day [[2014-12-01 06:00:00, 2015-01-01 00:00:00]] 360_day
Subspacing using cellwi to  1950-01-01 06:00:00 360_day 1970-12-10 00:00:00 360_day
Bounds range  [[1950-01-01 06:00:00, 1950-02-01 00:00:00]] 360_day [[1970-11-01 06:00:00, 1970-12-01 00:00:00]] 360_day
...

We are expected it to grab the next cell.

The relevant test failure is this one:

 def test_quarks(django_dependencies):
        test_db, ignore , ignore = django_dependencies
        vars = test_db.variable.all()
        v = vars[0]
        td = v.time_domain
    
        #check get sensibele results from trying to get the same thing
        starting = cf.Data(td.starting, units=cf.Units(td.units, calendar=td.calendar))
        ending = cf.Data(td.ending, units=cf.Units(td.units, calendar=td.calendar))
        manifest = v.in_manifest
        assert manifest is not None
        quark = test_db.manifest.make_quark(manifest, starting, ending)
        assert quark.id == manifest.id
        assert quark.is_quark is False
    
        #ok, now see if we can get a subset
        ending = cf.Data(10., units=cf.Units(
            'days since 1970-11-30',calendar='360_day'))
        quark = test_db.manifest.make_quark(manifest, starting, ending)
        assert quark.id != manifest.id
        assert quark.is_quark is True
        print(quark)
        expected = 'cn134a_999_6hr_u_pt_cordex__197012-197012.nc(unavailable)'
        fragments = list(quark.fragments.files.all())
>       assert expected == fragments[-1]
E       AssertionError: assert 'cn134a_999_6hr_u_pt_cordex__197012-197012.nc(unavailable)' == <File: cn134a_999_6hr_u_pt_cordex__197011-197011.nc(unavailable)>

which occurs because it's not getting to the next cell where that fragment file should have been present.

@davidhassell
Copy link
Collaborator

davidhassell commented Oct 30, 2024

Hi Bryan,

... 
Bounds range  [[1950-01-01 06:00:00, 1950-02-01 00:00:00]] 360_day [[2014-12-01 06:00:00, 2015-01-01 00:00:00]] 360_day
Subspacing using cellwi to  1950-01-01 06:00:00 360_day 1970-12-10 00:00:00 360_day
Bounds range  [[1950-01-01 06:00:00, 1950-02-01 00:00:00]] 360_day [[1970-11-01 06:00:00, 1970-12-01 00:00:00]] 360_day
...

But this looks right to me ... You asked for cells that lie completely inside 1950-01-01 06:00 to 1970-12-10 00:00, and that's what you got I realize, however, that the "completely" part of this was not made wholly clear :)

If you inspect the query you see that it's just a convenience function for creating a mode complicated Query object:

>>> print(cf.cellwi(start_date, end_date))
<CF Query: [lower_bounds(ge 1950-01-01 06:00:00) & upper_bounds(le 1970-12-10 00:00:00)]>

We can create a new "cell overlaps" query as follows, that should do what you want (edited):

>>> q1 = cf.ge(start_date, attr='upper_bounds')
>>> q2 = cf.le(end_date, attr='lower_bounds')
>>> cell_overlaps = q1 & q2
>>> cell_overlaps
<CF Query: [upper_bounds(ge 1950-01-01 06:00:00) & lower_bounds(le 1970-12-10 00:00:00)]>

and then

quark = fld.subspace(time=cell_overlaps(start_date, end_date))

I shall create this as a new cf.cell_overlaps function.

@davidhassell
Copy link
Collaborator

cf.cell_overlaps will be in the next version of cf-python: NCAS-CMS/cf-python#824

@bnlawrence
Copy link
Owner Author

Great. For the moment I have stuffed an explicit version into my code, and will replace it when the new CF comes out. The explicit version passes my tests!

bnlawrence pushed a commit that referenced this issue Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cf/cfa related
Projects
None yet
Development

No branches or pull requests

2 participants