Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cannot use parallel for checks #436

Open
3 tasks done
tuanpham96 opened this issue Feb 15, 2024 · 11 comments
Open
3 tasks done

[Bug]: Cannot use parallel for checks #436

tuanpham96 opened this issue Feb 15, 2024 · 11 comments
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users

Comments

@tuanpham96
Copy link

tuanpham96 commented Feb 15, 2024

What happened?

I've been having issues with running the inspector in parallel.

Doing this would be fine

nwbinspector data/nwb --config dandi --report-file-path inspector-results.txt

But using --n-jobs would tend to end in error

nwbinspector data/nwb --config dandi --report-file-path inspector-results.txt --n-jobs 8

The traceback is below:

concurrent.futures.process._RemoteTraceback:                                                                                
"""
Traceback (most recent call last):
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'register_check.<locals>.register_check_and_auto_parse.<locals>.auto_parse_some_output'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/users/lab/anaconda/nwb/bin/nwbinspector", line 8, in <module>
    sys.exit(inspect_all_cli())
             ^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/nwbinspector/nwbinspector.py", line 280, in inspect_all_cli
    messages = list(
               ^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/site-packages/nwbinspector/nwbinspector.py", line 454, in inspect_all
    for message in future.result():
                   ^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/lab/anaconda/nwb/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'register_check.<locals>.register_check_and_auto_parse.<locals>.auto_parse_some_output'

Operating System

Linux (not sure the exact OS, running on university cluster).

Python Version

Python 3.11.4

Were you streaming with ROS3?

No

Package Versions

hdmf                          3.11.0
numpy                         1.26.2
nwbinspector                  0.4.33
pandas                        2.1.3
pynwb                         2.5.0

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • Have you ensured this bug was not already reported?
  • To the best of your ability, have you ensured this is a bug within the code that checks the NWBFile, rather than a bug in the NWBFile reader (e.g., PyNWB or MatNWB)?
@tuanpham96 tuanpham96 added the category: bug errors in the code or code behavior label Feb 15, 2024
@CodyCBakerPhD
Copy link
Contributor

Quite strange - we have tests for parallel usage in the API (https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/tests/test_inspector.py#L212); and technically those leverage the same configuration wrapper via select mode rather than dandi

Tried adding a new API test for this call in #437, and I suppose I should add a CLI invocation and report saving just for good measure

Are you able by change to reproduce this in any way minimally? Would you be willing to share the NWB files in question or if on DANDI, point me towards the dandiset? Also can you share the results of the non-parallel report? (Wondering if it might be some specific check that in particular causes this, and that check is not in one of the base files we use for testing)

@CodyCBakerPhD
Copy link
Contributor

Interesting; the CLI invocation reproduces it! https://github.com/NeurodataWithoutBorders/nwbinspector/actions/runs/7910884792/job/21594182388?pr=437

But the API usage doesn't... hmm

Will dig into this

@CodyCBakerPhD
Copy link
Contributor

Though even more curiously seems to only affect linux and mac 😅

@tuanpham96
Copy link
Author

I tried to reproduce with simple test nwb files. Apparently the --config dandi is the cause.

import numpy as np
from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile, TimeSeries, NWBHDF5IO

num_files = 10

for i in range(num_files):
    start_time = datetime(2024, 1, 2, 3, 4, 5, tzinfo=tzlocal())
    name = f"test_{i:03d}"
    
    nwbfile = NWBFile(
        session_description=name,
        identifier=name,
        session_start_time=start_time,
    )

    ts = TimeSeries(
        name=f"ts{i}",
        data=np.random.randn(10, 200),
        unit="-",
        timestamps=np.arange(10),
    )
    nwbfile.add_acquisition(ts)

    with NWBHDF5IO(name + '.nwb', "w") as io:
        io.write(nwbfile)

Running with this is fine:

 nwbinspector . --n-jobs 4

But this would throw the pretty much the same error:

nwbinspector . --config dandi --n-jobs 4

Browsing the code, my guess is it has to do something with how the functions in available_checks are copied when a config is passed, but couldn't find a workaround.

@tuanpham96
Copy link
Author

tuanpham96 commented Feb 15, 2024

Apparently, not using copy_check would be fine, although I'm not sure whether that is valid.

If I change this line:

mapped_check = copy_check(check=check)

to just

 mapped_check = check

things would work fine.

@CodyCBakerPhD
Copy link
Contributor

But is the full testing suite OK with that change?

@CodyCBakerPhD
Copy link
Contributor

BTW I'm having trouble reproducing this locally on my Windows and Linux systems, hence a bit of delay on getting to this

@tuanpham96
Copy link
Author

But is the full testing suite OK with that change?

I haven't run the tests yet. I'll try to run them later today or tomorrow on my local machine.

@tuanpham96
Copy link
Author

I tried running the tests on your PR branch locally on my machine (Linux) without copy_check, and I have weird results with test_inspector.py

Testing on that file would have the *parallel* tests pass, but fail on *importance_threshold* tests

pytest -v tests/test_inspector.py                                       
================================================================ test session starts =================================================================
platform linux -- Python 3.10.10, pytest-8.0.0, pluggy-1.4.0 -- 
cachedir: .pytest_cache
plugins: cov-4.1.0
collected 20 items                                                                                                                                   

tests/test_inspector.py::TestInspector::test_command_line_on_directory_matches_file PASSED                                                     [  5%]
tests/test_inspector.py::TestInspector::test_command_line_organizes_levels PASSED                                                              [ 10%]
tests/test_inspector.py::TestInspector::test_command_line_runs_cli_only PASSED                                                                 [ 15%]
tests/test_inspector.py::TestInspector::test_command_line_runs_cli_only_parallel PASSED                                                        [ 20%]
tests/test_inspector.py::TestInspector::test_command_line_runs_saves_json PASSED                                                               [ 25%]
tests/test_inspector.py::TestInspector::test_command_line_saves_report PASSED                                                                  [ 30%]
tests/test_inspector.py::TestInspector::test_inspect_all PASSED                                                                                [ 35%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile PASSED                                                                            [ 40%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_dandi_config PASSED                                                               [ 45%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_importance FAILED                                         [ 50%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_string FAILED                                             [ 55%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_manual_iteration PASSED                                                           [ 60%]
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_manual_iteration_stop PASSED                                                      [ 65%]
tests/test_inspector.py::TestInspector::test_iterable_check_function PASSED                                                                    [ 70%]
tests/test_inspector.py::TestDANDIConfig::test_command_line_runs_parallel_with_config_and_report PASSED                                        [ 75%]
tests/test_inspector.py::TestDANDIConfig::test_inspect_nwbfile_dandi_config_critical_only_entire_registry PASSED                               [ 80%]
tests/test_inspector.py::TestDANDIConfig::test_inspect_nwbfile_dandi_config_violation_and_above_entire_registry PASSED                         [ 85%]
tests/test_inspector.py::TestCheckUniqueIdentifiersPass::test_check_unique_identifiers_pass PASSED                                             [ 90%]
tests/test_inspector.py::TestCheckUniqueIdentifiersFail::test_check_unique_identifiers_fail PASSED                                             [ 95%]
tests/test_inspector.py::test_dandi_config_in_vitro_injection PASSED                                                                           [100%]

But then re-doing with only the failed tests seem to pass.

pytest -v tests/test_inspector.py -k test_inspect_nwbfile_importance_threshold                                                                    
================================================================ test session starts =================================================================
platform linux -- Python 3.10.10, pytest-8.0.0, pluggy-1.4.0 -- /home/penguinsfly/Documents/work/fleischmann-lab-brown/general/dev-tools/contrib/nwbin
spector/.venv/bin/python                                                                                                                              
cachedir: .pytest_cache                                                                                                                               
rootdir: /home/penguinsfly/Documents/work/fleischmann-lab-brown/general/dev-tools/contrib/nwbinspector                                                
plugins: cov-4.1.0                                                                                                                                    
collected 20 items / 18 deselected / 2 selected                                                                                                       
                                                                                                                                                      
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_importance PASSED                                         [ 50%] 
tests/test_inspector.py::TestInspector::test_inspect_nwbfile_importance_threshold_as_string PASSED                                             [100%]

Do you know why?

@CodyCBakerPhD
Copy link
Contributor

CodyCBakerPhD commented Feb 15, 2024

Yeah, that's the reason the copy function exists in the first place. It's kind of messy to go into detail

Will try to think of broader workaround. For now I recommend running either without parallel and with config (slowest but most accurate), or without config and with parallel (recommended for very large number of files)

Note that the DANDI config just elevates certain subject related items to a high enough level to prevent validation - so you could theoretically just cross-reference the output against this file

@tuanpham96
Copy link
Author

Gotcha. Thanks for the suggestions!

@stephprince stephprince added the priority: medium non-critical problem and/or affecting only a small set of NWB users label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users
Projects
None yet
Development

No branches or pull requests

3 participants