Filename error (Docker) #41

ncbss · 2024-10-11T17:57:23Z

Hi there,
Thanks for all your work on developing DLMUSE! I am trying to use for some analysis in the lab and, during initial testing of the Docker container, an error occurred. Please see below:

Here's the command I used on Docker (version: 4.34.2 (167172)) on my MacBook Pro (Apple M1 Pro, Sequoia 15.0)

docker run -it --name DLMUSE_inference --rm \
    --mount type=bind,source=/Users/narlonsilva/Desktop/test-nichart/input,target=/input,readonly \
    --mount type=bind,source=/Users/narlonsilva/Desktop/test-nichart/output,target=/output \
    --platform linux/amd64 cbica/nichart_dlmuse:1.0.1-cuda11.8 \
    -d cpu

Here's the error:

Arguments:
Namespace(in_data='/input', out_dir='/output', device='cpu', clear_cache=False, help=False)

Detected 1 images ...
Number of valid images is 1 ...
------------------------
   Reorient images
Out file exists, skip reorientation ...
------------------------
   Apply DLICV
Running DLICV
Renaming dic is saved to /output/temp_working_dir/s2_dlicv/renamed_image/renaming.json
Loading the model...
perform_everything_on_device=True is only supported for cuda devices! Setting this to False
There are 1 cases in the source folder
I am process 0 out of 1 (max process ID is 0, we start counting with 0!)
There are 1 cases that I would like to predict

Predicting case_ 000:
perform_everything_on_device: False
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27/27 [41:57<00:00, 93.22s/it]
sending off prediction to background worker for resampling and export
done with case_ 000
Bus error
Rename dlicv out file
------------------------
   Apply DLICV mask
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/nibabel/loadsave.py", line 100, in load
    stat_result = os.stat(filename)
                  ^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/output/temp_working_dir/s2_dlicv/mni_t1w_DLICV.nii.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/NiChart_DLMUSE", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/NiChart_DLMUSE/__main__.py", line 113, in main
    run_pipeline(in_data, out_dir, device)
  File "/opt/conda/lib/python3.11/site-packages/NiChart_DLMUSE/dlmuse_pipeline.py", line 97, in run_pipeline
    apply_mask_img(df_img, in_dir, in_suff, mask_dir, mask_suff, out_dir, out_suff)
  File "/opt/conda/lib/python3.11/site-packages/NiChart_DLMUSE/MaskImage.py", line 159, in apply_mask_img
    mask_img(in_img, in_mask, out_img)
  File "/opt/conda/lib/python3.11/site-packages/NiChart_DLMUSE/MaskImage.py", line 77, in mask_img
    nii_mask = nib.load(mask_img)
               ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/nibabel/loadsave.py", line 102, in load
    raise FileNotFoundError(f"No such file or no access: '{filename}'")
FileNotFoundError: No such file or no access: '/output/temp_working_dir/s2_dlicv/mni_t1w_DLICV.nii.gz'

Thanks for your help!

The text was updated successfully, but these errors were encountered:

AlexanderGetka-cbica · 2024-10-11T19:38:57Z

Hi @ncbss , thanks for your interest in DLMUSE!

Admittedly, the images currently up on Docker Hub are not in a thoroughly tested state yet, and don't necessarily reflect the latest version. Nevertheless, we haven't seen this bus error before -- at a glance everything else is a downstream error of that.

We will do some research on this issue and get back to you, probably with an updated image for you to pull.

Tagging @spirosmaggioros , our local Mac expert. Could you try to replicate this using Docker on your Mac?

spirosmaggioros · 2024-10-11T20:20:11Z

Hi @ncbss, thank you for pointing out this issue, we have the same chip so im quite sure what is going on. nnUNetv2 uses 3d operations and the M1 chip is not very good at performing these operations. Also, i believe that the M1 macs are not emulating the x86 environment very good either. I don't work on the docker versions so i never tested it, but now that i did we have the same problem. What is happening is that the workers are failing in the background(notice that you have a Bus error) and then the output files aren't present in the next step.

Also the docker image is not updated with the improvements i did in my last PR's, now the inference is parallelized and it might help with this. Give us and specifically @AlexanderGetka-cbica a moment to figure out if we can fix that.
Thanks once again for your feedback! Hope we find a solution soon.

AlexanderGetka-cbica · 2024-10-11T21:05:30Z

Hi @ncbss , I just pushed a version to Docker Hub under the following tag:
cbica/nichart:1.0.4-default

Can you give this a try and see if it works for you? Of note, there are multiple options that might be useful. You can append them after the -d cpu part (by the way, since you are on Mac, maybe you can try -d mps for improved performance.)

-c sets the number of cores used for parallelization of the whole pipeline (default: 4). Setting this higher might result in faster inference, setting this to 1 will minimize the amount of resources consumed.

--dlmuse_args "-nps 1 -npp 1" --dlicv_args "-nps 1 -npp 1" will cause all the various resampling/export steps to use only one worker thread, minimizing resource consumption and the risk of out-of-memory or similar errors, which we have previously observed to cause failures in this step. It will be slightly slower though.

To get the inference to work bare-minimum, I suggest values of 1 for all the above just to see, then gradually increasing values until you find something that works optimally for your system. If you could report back on this, that would be very helpful for us, too.

spirosmaggioros · 2024-10-11T21:16:35Z

Just to save you time following Alex's response, only the new M3 chip supports 3d convolution(that nnunetv2 performs), so you can't use MPS to run NiChart DLMUSE. Unfortunately only cuda offers a faster option. You can take a look at nnunet's documentation.

I use a VM with A100 GPUs to run it.

spirosmaggioros · 2024-10-25T20:42:20Z

Is this resolved?

AlexanderGetka-cbica · 2024-10-28T00:09:53Z

Hi @ncbss, I just found another potential solution since I encountered this in my own environment. Try passing --ipc=host to the docker run command (to be clear, this should go before the image name).

ncbss · 2024-10-28T00:12:49Z

Thank you all for your helping! Just testing this out tonight. I will report back ASAP.

ncbss · 2024-10-28T00:35:12Z

Hi again!

So, testing the new container cbica/nichart_dlmuse:1.0.4-default , when I run the code below:

docker run -it --name DLMUSE_inference --rm \
    --mount type=bind,source=/Users/narlonsilva/Desktop/test-nichart/input/,target=/input,readonly \
    --mount type=bind,source=/Users/narlonsilva/Desktop/test-nichart/output,target=/output \
    --platform linux/amd64 \
    --ipc=host cbica/nichart_dlmuse:1.0.4-default \
    -d cpu

I get this error:

Arguments:
Namespace(in_data='/input', out_dir='/output', device='cpu', cores=4, clear_cache=False, dlmuse_args='', dlicv_args='', help=False)

mkdir: cannot create directory ‘/input/split_1’: Read-only file system
cp: cannot create regular file '/input/split_1': Read-only file system
rm: cannot remove '/output/split_*': No such file or directory
rm: cannot remove '/input/split_*': No such file or directory

If I remove the flag readonly then I get this error:

Arguments:
Namespace(in_data='/input', out_dir='/output', device='cpu', cores=4, clear_cache=False, dlmuse_args='', dlicv_args='', help=False)

rm: cannot remove '/output/split_*': No such file or directory

This is what my working directory looks like prior to running the code above:

input      output     runmuse.sh

./input:
T1w.nii.gz

./output:

AlexanderGetka-cbica · 2024-10-28T00:46:22Z

Thanks for the detailed reporting!

@spirosmaggioros it seems this is relevant to your parallelization code, but in retrospect we should probably avoid writing anything to the input dir. Let's discuss the approach tomorrow.

@ncbss You might be able to try the previous version you tried, but with the --ipc=host fix mentioned above. I should have noted earlier that this does alter the security profile of Docker, so just be aware of that (some discussion on this here: https://stackoverflow.com/questions/38907708/docker-ipc-host-and-security ). We'll be in touch about an updated container, thanks for your patience.

spirosmaggioros · 2024-10-28T11:07:06Z

Hi again!

So, testing the new container cbica/nichart_dlmuse:1.0.4-default , when I run the code below:

docker run -it --name DLMUSE_inference --rm \
    --mount type=bind,source=/Users/narlonsilva/Desktop/test-nichart/input/,target=/input,readonly \
    --mount type=bind,source=/Users/narlonsilva/Desktop/test-nichart/output,target=/output \
    --platform linux/amd64 \
    --ipc=host cbica/nichart_dlmuse:1.0.4-default \
    -d cpu

I get this error:

Arguments:
Namespace(in_data='/input', out_dir='/output', device='cpu', cores=4, clear_cache=False, dlmuse_args='', dlicv_args='', help=False)

mkdir: cannot create directory ‘/input/split_1’: Read-only file system
cp: cannot create regular file '/input/split_1': Read-only file system
rm: cannot remove '/output/split_*': No such file or directory
rm: cannot remove '/input/split_*': No such file or directory

If I remove the flag readonly then I get this error:

Arguments:
Namespace(in_data='/input', out_dir='/output', device='cpu', cores=4, clear_cache=False, dlmuse_args='', dlicv_args='', help=False)

rm: cannot remove '/output/split_*': No such file or directory

This is what my working directory looks like prior to running the code above:

input      output     runmuse.sh

./input:
T1w.nii.gz

./output:

@ncbss The issue you have here is that you only have one file and the default cores for the data splitting is 4, so, in order to just do one file you have to set "--cores 1", as for one file parallelization can't do any better than one core. It's my mistake to not take this into consideration, but i didn't expect single nifti files. Will update this for the newer PyPI version. Thanks for noticing.

Quick update: I fixed the issue and will be merged soon, until then, if you have < 4 files please try to set --cores to 1.

spirosmaggioros · 2024-10-30T11:45:10Z

We will close this for now as the latest commit fix this issue. If any other issues appear we will reopen it.

spirosmaggioros added bug Something isn't working docker labels Oct 11, 2024

spirosmaggioros assigned spirosmaggioros and AlexanderGetka-cbica Oct 11, 2024

spirosmaggioros mentioned this issue Oct 29, 2024

Fixing issues with parallelization #47

Merged

spirosmaggioros linked a pull request Oct 29, 2024 that will close this issue

Fixing issues with parallelization #47

Merged

spirosmaggioros closed this as completed in #47 Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filename error (Docker) #41

Filename error (Docker) #41

ncbss commented Oct 11, 2024

AlexanderGetka-cbica commented Oct 11, 2024

spirosmaggioros commented Oct 11, 2024 •

edited

Loading

AlexanderGetka-cbica commented Oct 11, 2024

spirosmaggioros commented Oct 11, 2024 •

edited

Loading

spirosmaggioros commented Oct 25, 2024

AlexanderGetka-cbica commented Oct 28, 2024

ncbss commented Oct 28, 2024

ncbss commented Oct 28, 2024

AlexanderGetka-cbica commented Oct 28, 2024

spirosmaggioros commented Oct 28, 2024 •

edited

Loading

spirosmaggioros commented Oct 30, 2024

Filename error (Docker) #41

Filename error (Docker) #41

Comments

ncbss commented Oct 11, 2024

AlexanderGetka-cbica commented Oct 11, 2024

spirosmaggioros commented Oct 11, 2024 • edited Loading

AlexanderGetka-cbica commented Oct 11, 2024

spirosmaggioros commented Oct 11, 2024 • edited Loading

spirosmaggioros commented Oct 25, 2024

AlexanderGetka-cbica commented Oct 28, 2024

ncbss commented Oct 28, 2024

ncbss commented Oct 28, 2024

AlexanderGetka-cbica commented Oct 28, 2024

spirosmaggioros commented Oct 28, 2024 • edited Loading

spirosmaggioros commented Oct 30, 2024

spirosmaggioros commented Oct 11, 2024 •

edited

Loading

spirosmaggioros commented Oct 11, 2024 •

edited

Loading

spirosmaggioros commented Oct 28, 2024 •

edited

Loading