-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convenience functions to set preferences #88
Add convenience functions to set preferences #88
Conversation
Looks like the new P4est_jll destroyed something 😬 julia> using Pkg
julia> Pkg.add(["MPIPreferences", "P4est_jll", "MPI"])
[...]
julia> using MPIPreferences
julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│ libmpi = "libmpi"
│ version_string = "Open MPI v4.1.0, package: Debian OpenMPI, ident: 4.1.0, repo rev: v4.1.0, Dec 18, 2020\0"
│ impl = "OpenMPI"
│ version = v"4.1.0"
└ abi = "OpenMPI"
┌ Info: MPIPreferences changed
│ binary = "system"
│ libmpi = "libmpi"
│ abi = "OpenMPI"
│ mpiexec = "mpiexec"
│ preloads = Any[]
└ preloads_env_switch = nothing and after restarting: julia> using P4est_jll
julia> using MPI
[ Info: Precompiling MPI [da04e1cc-30fd-572f-bb4f-1f8673147195]
julia> MPI.Init()
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[saola6:24126] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! Without |
It's probably because of a mismatch between the OpenMPI versions (my local one and the one that is used in the OpenMPI build of P4est_jll), right? |
Ok, the same happens with HDF5_jll. So that's probably not the problem... |
There are some really weird issues:
for ubuntu-latest - P4EST_CUSTOM_MPI_CUSTOM - Julia 1 and
|
Co-authored-by: Hendrik Ranocha <[email protected]>
Looks like the wrong OMPI version is used. After setting all Preferences (MPIPreferences and P4est preference) and restarting the REPL on Roci: julia> using P4est
shell> ompi_info
Package: Debian OpenMPI
Open MPI: 4.0.3
Open MPI repo revision: v4.0.3
Open MPI release date: Mar 03, 2020
Open RTE: 4.0.3
Open RTE repo revision: v4.0.3
Open RTE release date: Mar 03, 2020
OPAL: 4.0.3
OPAL repo revision: v4.0.3
OPAL release date: Mar 03, 2020
MPI API: 3.1.0
Ident string: 4.0.3
Prefix: /home/jlampert/.julia/artifacts/6f1138d9f7bcc8575ac98bb3ccbc47505c718c80
Configured architecture: x86_64-pc-linux-gnu
Configure host: lcy01-amd64-020
Configured by: buildd
Configured on: Wed Apr 15 13:14:35 UTC 2020
Configure host: lcy01-amd64-020
[...] which means that it still uses the OpenMPI version from the Artifact and not the system OpenMPI version. On the other hand after performing the same steps with HDF5: julia> using HDF5
shell> ompi_info
Package: Debian OpenMPI
Open MPI: 4.0.3
Open MPI repo revision: v4.0.3
Open MPI release date: Mar 03, 2020
Open RTE: 4.0.3
Open RTE repo revision: v4.0.3
Open RTE release date: Mar 03, 2020
OPAL: 4.0.3
OPAL repo revision: v4.0.3
OPAL release date: Mar 03, 2020
MPI API: 3.1.0
Ident string: 4.0.3
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: lcy01-amd64-020
Configured by: buildd
Configured on: Wed Apr 15 13:14:35 UTC 2020
Configure host: lcy01-amd64-020
[...] With HDF5.jl |
The kind of failure
for windows-latest - P4EST_JLL_MPI_DEFAULT - Julia 1 also happens in our CI tests of Trixi.jl. Do you have an idea what's causing this? If not, I would suggest rolling back the PR to P4est_jll to hotfix CI for now |
We're seeing other, likely p4est-related issues also coming up for libtrixi (e.g., here). I wonder if something is severely broken with the latest JLL, though I can't understand what and why. However, if this persists, we might need to consider to reverting the JLL and stripping OpenMPI from its build options, then fix it first locally, before recreating the JLL with OpenMPI enabled. |
Yes, please. Can we proceed as follows to fix this?
|
Ok, I revert the P4est_jll build in JuliaPackaging/Yggdrasil#7510. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #88 +/- ##
==========================================
- Coverage 16.31% 16.18% -0.13%
==========================================
Files 3 3
Lines 1521 1533 +12
==========================================
Hits 248 248
- Misses 1273 1285 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Adds the functions
set_library_p4est!
andset_library_sc!
to conveniently set the preferences for custom libraries.