Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update test fixtures #454

Closed
wants to merge 6 commits into from
Closed

Update test fixtures #454

wants to merge 6 commits into from

Conversation

maxrjones
Copy link
Member

cc @TomNicholas and @sharkinsspatial

The dependencies are all the same between the passing and hanging workflows, so I think this was actually cause by a change in #420. We should probably revert the PR if it might take a while to fix the issue.

This is the only test that uses the netcdf4_inlined_ref fixture, so I think it's probably something going on there but I haven't looked at all.

Copy link

codecov bot commented Feb 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.64%. Comparing base (c461b33) to head (133eea4).
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #454   +/-   ##
=======================================
  Coverage   88.64%   88.64%           
=======================================
  Files          28       28           
  Lines        1752     1752           
=======================================
  Hits         1553     1553           
  Misses        199      199           

@maxrjones
Copy link
Member Author

Pretty weird one - I can reproduce using act with the Python 3.11 environment, but not locally. I wonder if it's an os-specific issue since we only use ubuntu for the tests. I'm not sure how to debug within the docker container. @chuckwondo do you have experience with that?

@TomNicholas
Copy link
Member

Thanks for looking into this @maxrjones!

I clearly shouldn't have merged the PR, feel free to revert it.

@chuckwondo
Copy link
Contributor

Pretty weird one - I can reproduce using act with the Python 3.11 environment, but not locally. I wonder if it's an os-specific issue since we only use ubuntu for the tests. I'm not sure how to debug within the docker container. @chuckwondo do you have experience with that?

Sorry, I'm not following. I see that this build failed, but it shows as canceled, and I cannot see why: https://github.com/zarr-developers/VirtualiZarr/actions/runs/13580434648/job/37998176988

Where are you seeing something hanging? What are you doing to reproduce via act?

@chuckwondo
Copy link
Contributor

chuckwondo commented Mar 1, 2025

@maxrjones, I pulled the latest from main (containing #420 merged) and ran act -j test --matrix python-version:3.11 and it succeeded. Am I missing something, or was there perhaps a transient network issue in the network tests that were perhaps causing whatever "hanging" you're referring to?

@maxrjones
Copy link
Member Author

maxrjones commented Mar 1, 2025

I think this is an issue with file locking due to this parameterized fixture writing to the same file. I may have a fix within in the next half hour, otherwise I'll reach out for help again.

Sorry for causing confusion @chuckwondo - I cancelled the workflow manually after a half hour since it should only take a couple minutes.

I am convinced this is not strictly a network issue based on having also seen the test hang (i.e., run for more than 500% the usual time) using act -j test. It may be necessary to run multiple matrix options in order for the file locking to occur, and even then it might be intermittent.

@maxrjones maxrjones changed the title Mark hanging test as xfail Update test fixtures Mar 1, 2025
Copy link
Contributor

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of ignorable suggestions, so I've approved in case you choose to ignore them.

@chuckwondo
Copy link
Contributor

I think this is an issue with file locking due to this parameterized fixture writing to the same file. I may have a fix within in the next half hour, otherwise I'll reach out for help again.

Sorry for causing confusion @chuckwondo - I cancelled the workflow manually after a half hour since it should only take a couple minutes.

I am convinced this is not strictly a network issue based on having also seen the test hang (i.e., run for more than 500% the usual time) using act -j test. It may be necessary to run multiple matrix options in order for the file locking to occur, and even then it might be intermittent.

Thanks for clarifying!

@chuckwondo
Copy link
Contributor

@maxrjones, I don't think this is the issue.

I see the current build hanging at this point:

virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[json] PASSED [ 81%]
virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[invalid] PASSED [ 82%]

The other build hung similarly:

virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[json] PASSED [ 82%]
virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[invalid] PASSED [ 82%]
virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[parquet] PASSED [ 82%]

These tests don't even use the fixture in question.

@maxrjones
Copy link
Member Author

maxrjones commented Mar 1, 2025

@maxrjones, I don't think this is the issue.

I see the current build hanging at this point:

virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[json] PASSED [ 81%]
virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[invalid] PASSED [ 82%]

The other build hung similarly:

virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[json] PASSED [ 82%]
virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[invalid] PASSED [ 82%]
virtualizarr/tests/test_readers/test_kerchunk.py::test_open_virtual_dataset_existing_kerchunk_refs[parquet] PASSED [ 82%]

These tests don't even use the fixture in question.

Yes, I shouldn't have been using such a shot in the dark approach. I think the changes in this PR are good, but not directed towards the problem.

I opened #455 as a more targeted change, but there are so many warnings about closing the CachingFileManager on an open file that I don't expect it to actually help 😕. I think it might be necessary to do a full evaluation of where files are opened and closed to figure out if it's truly a problem.

It also possible that it is a network issue with the xarray tutorial dataset as well. Maybe it's worth saving a copy locally to use and see if the test still hangs. As hooked as I am on the problem, I'm a bit out of time to work on it more now.

@sharkinsspatial
Copy link
Collaborator

@maxrjones Thanks for investigating this. Apologies, I wasn't able to reproduce this locally and I hadn't considered using act (nice concept @maxrjones 👍 ) in a few attempts so I thought it might be an ephemeral CI issue but apparently not. Regardless I think improving our file handle hygiene is a good idea going forward.

@TomNicholas
Copy link
Member

TomNicholas commented Mar 2, 2025

I think it might be necessary to do a full evaluation of where files are opened and closed to figure out if it's truly a problem.

I think once we have the ManifestStore (#427) we will be able to reason about this much more clearly. We can then basically change open_virtual_dataset to

  1. open the file
  2. create ManifestArrays for all variables by reading metadata from the file
  3. close the file
  4. optionally later try to load some of the ManifestArrays using the ManifestStore, but at that point the reading of data will be implemented using the same internals as a read-only FsspecStore/ObjectStore internally, which presumably/hopefully manages file handles responsibly.

The logic you changed in #455 will then no longer even need to exist.

@maxrjones
Copy link
Member Author

Superseded by #460

@maxrjones maxrjones closed this Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants