Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience functions to set preferences #88

Merged

Conversation

JoshuaLampert
Copy link
Member

Adds the functions set_library_p4est! and set_library_sc! to conveniently set the preferences for custom libraries.

@JoshuaLampert
Copy link
Member Author

Looks like the new P4est_jll destroyed something 😬

julia> using Pkg

julia> Pkg.add(["MPIPreferences", "P4est_jll", "MPI"])
[...]
julia> using MPIPreferences

julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│   libmpi = "libmpi"
│   version_string = "Open MPI v4.1.0, package: Debian OpenMPI, ident: 4.1.0, repo rev: v4.1.0, Dec 18, 2020\0"
│   impl = "OpenMPI"
│   version = v"4.1.0"
└   abi = "OpenMPI"
┌ Info: MPIPreferences changed
│   binary = "system"
│   libmpi = "libmpi"
│   abi = "OpenMPI"
│   mpiexec = "mpiexec"
│   preloads = Any[]
└   preloads_env_switch = nothing

and after restarting:

julia> using P4est_jll
julia> using MPI
[ Info: Precompiling MPI [da04e1cc-30fd-572f-bb4f-1f8673147195]

julia> MPI.Init()
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[saola6:24126] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Without using P4est_jll it works (of course) and the same, but with P4est_jll v2.8.1.+2, too. I think that's the problem behind at least some of the CI failures. Any ideas?

@JoshuaLampert
Copy link
Member Author

It's probably because of a mismatch between the OpenMPI versions (my local one and the one that is used in the OpenMPI build of P4est_jll), right?

@JoshuaLampert
Copy link
Member Author

Ok, the same happens with HDF5_jll. So that's probably not the problem...

src/P4est.jl Outdated Show resolved Hide resolved
src/P4est.jl Outdated Show resolved Hide resolved
src/P4est.jl Outdated Show resolved Hide resolved
@ranocha
Copy link
Member

ranocha commented Oct 7, 2023

There are some really weird issues:

mpiexec: Error: unknown option "-n"

for ubuntu-latest - P4EST_CUSTOM_MPI_CUSTOM - Julia 1 and

  ✗ P4est
  19 dependencies successfully precompiled in 13 seconds

ERROR: The following 1 direct dependency failed to precompile:

P4est [7d669430-f675-4ae7-b43e-fab78ec5a902]

Failed to precompile P4est [7d669430-f675-4ae7-b43e-fab78ec5a902] to "C:\\Users\\runneradmin\\.julia\\compiled\\v1.9\\P4est\\jl_16C0.tmp".

[1108] signal (22): SIGABRT
in expression starting at D:\a\P4est.jl\P4est.jl\src\LibP4est.jl:10

for windows-latest - P4EST_JLL_MPI_DEFAULT - Julia 1

Co-authored-by: Hendrik Ranocha <[email protected]>
@JoshuaLampert
Copy link
Member Author

JoshuaLampert commented Oct 7, 2023

Looks like the wrong OMPI version is used. After setting all Preferences (MPIPreferences and P4est preference) and restarting the REPL on Roci:

julia> using P4est

shell> ompi_info
                 Package: Debian OpenMPI
                Open MPI: 4.0.3
  Open MPI repo revision: v4.0.3
   Open MPI release date: Mar 03, 2020
                Open RTE: 4.0.3
  Open RTE repo revision: v4.0.3
   Open RTE release date: Mar 03, 2020
                    OPAL: 4.0.3
      OPAL repo revision: v4.0.3
       OPAL release date: Mar 03, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.3
                  Prefix: /home/jlampert/.julia/artifacts/6f1138d9f7bcc8575ac98bb3ccbc47505c718c80
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: lcy01-amd64-020
           Configured by: buildd
           Configured on: Wed Apr 15 13:14:35 UTC 2020
          Configure host: lcy01-amd64-020
[...]

which means that it still uses the OpenMPI version from the Artifact and not the system OpenMPI version. On the other hand after performing the same steps with HDF5:

julia> using HDF5

shell> ompi_info
                 Package: Debian OpenMPI
                Open MPI: 4.0.3
  Open MPI repo revision: v4.0.3
   Open MPI release date: Mar 03, 2020
                Open RTE: 4.0.3
  Open RTE repo revision: v4.0.3
   Open RTE release date: Mar 03, 2020
                    OPAL: 4.0.3
      OPAL repo revision: v4.0.3
       OPAL release date: Mar 03, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.3
                  Prefix: /usr
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: lcy01-amd64-020
           Configured by: buildd
           Configured on: Wed Apr 15 13:14:35 UTC 2020
          Configure host: lcy01-amd64-020
[...]

With HDF5.jl ompi_info reports to use the system OMPI version located under /usr after setting the preference for the custom hdf5 library. In contrast, setting the preference for the p4est library doesn't change the OMPI version that is used. I guess that is the problem at least for the failure on Ubuntu. open-mpi/ompi#4557 indicates that the error message mpiexec: Error: unknown option "-n" is caused by a conflict of OMPI installations.

@ranocha
Copy link
Member

ranocha commented Oct 9, 2023

The kind of failure

  ✗ P4est
  19 dependencies successfully precompiled in 13 seconds

ERROR: The following 1 direct dependency failed to precompile:

P4est [7d669430-f675-4ae7-b43e-fab78ec5a902]

Failed to precompile P4est [7d669430-f675-4ae7-b43e-fab78ec5a902] to "C:\\Users\\runneradmin\\.julia\\compiled\\v1.9\\P4est\\jl_16C0.tmp".

[1108] signal (22): SIGABRT
in expression starting at D:\a\P4est.jl\P4est.jl\src\LibP4est.jl:10

for windows-latest - P4EST_JLL_MPI_DEFAULT - Julia 1

also happens in our CI tests of Trixi.jl. Do you have an idea what's causing this? If not, I would suggest rolling back the PR to P4est_jll to hotfix CI for now

@sloede
Copy link
Member

sloede commented Oct 9, 2023

We're seeing other, likely p4est-related issues also coming up for libtrixi (e.g., here). I wonder if something is severely broken with the latest JLL, though I can't understand what and why.

However, if this persists, we might need to consider to reverting the JLL and stripping OpenMPI from its build options, then fix it first locally, before recreating the JLL with OpenMPI enabled.

@ranocha
Copy link
Member

ranocha commented Oct 9, 2023

Yes, please. Can we proceed as follows to fix this?

  1. Revert the P4est_jll build to make everything work again for now
  2. Add explicit compat bopunds for P4est_jll.jl to P4est.jl, allowing only versions that we have tested (CompatHelper can help us updating if necessary)
  3. Try to debug the P4est_jll build
  4. Update P4est.jl to a new P4est_jll if possible

@JoshuaLampert
Copy link
Member Author

Ok, I revert the P4est_jll build in JuliaPackaging/Yggdrasil#7510.

@codecov
Copy link

codecov bot commented Oct 12, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (1811e9b) 16.31% compared to head (60d1137) 16.18%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #88      +/-   ##
==========================================
- Coverage   16.31%   16.18%   -0.13%     
==========================================
  Files           3        3              
  Lines        1521     1533      +12     
==========================================
  Hits          248      248              
- Misses       1273     1285      +12     
Flag Coverage Δ
unittests 16.18% <0.00%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/P4est.jl 50.00% <0.00%> (-42.86%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ranocha ranocha merged commit a6371a3 into trixi-framework:main Oct 12, 2023
@JoshuaLampert JoshuaLampert deleted the preferences-convenience-functions branch October 12, 2023 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants