-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5: infinite loop error on Setonix (using singularity/3.8.6-mpi) #668
Comments
After digging further into the Pawsey doco I found this https://pawsey.org.au/technical-newsletter/ (see 13 March 2023 entry):
I guess I'm about to install UW2 from source on Setonix... Would you have any step-by-step recipe at hands for this specific Cray machine? I found the one you put together for Magnus a few years back. |
Hey Gilly, Yeah this is an on going issue we have raise with setonix on several occasions. For now we are stuck with build bare metal builds on setonix. I will upload some instructions for it later today. |
Hey Gilly, |
Hi Jules, I have been off grid for the past couple weeks and back in the office now. If you have a recipe at hand for the install I would love to give it to! |
Hi Gilly, https://support.pawsey.org.au/documentation/display/US/Containers+changes |
Hello guys,
I've installed UW2 latest container on Setonix (Pawsey Center) using Singularity and it went quite smoothly 👍
There are 2 versions of Singularity available on Setonix: 1)
singularity/3.8.6-nompi
et 2)singularity/3.8.6-mpi
I first ran a test job in serial using the
singularity/3.8.6-nompi
module and all went well.But, when I try to run the same test job in parallel using the
singularity/3.8.6-mpi
module I get an error message (related to hdf5 AFAICT) that takes place when the code tries to write the step 0 outputs (either on one or on multiple ranks).Below it the stdout returned when running
singularity/3.8.6-mpi
version on a single core:I suspect this is a
singularity
problem and not an UW2 problem... are you familiar with this type of error?I can report with the Pawsey center Helpdesk if you confirm this is a singularity problem.
Cheers
Guillaume
The text was updated successfully, but these errors were encountered: