Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mpiexecjl] Return exit code of the mpiexec process #834

Merged
merged 1 commit into from
Jun 14, 2024

Conversation

giordano
Copy link
Member

@giordano giordano commented Jun 4, 2024

Should fix #833. I don't have the time to write tests now though, so opening as draft. @cohensbw could you please test this? With this PR I get

% mpiexecjl -np 2 --project julia --color=yes -e 'exit(2)'; echo $?
2

@giordano giordano changed the title [mpiexecjl] Return exit code of the root rank [mpiexecjl] Return exit code of the mpiexec process Jun 4, 2024
@cohensbw
Copy link

cohensbw commented Jun 4, 2024

A quick test does some to indicate that this fixes the issue. I am now able to recover exit codes other than 1 when passed to exit().

bin/mpiexecjl Outdated Show resolved Hide resolved
bin/mpiexecjl Outdated Show resolved Hide resolved
@giordano giordano force-pushed the mg/mpiexecjl-exitcode branch from 3dc1925 to f5ade2b Compare June 13, 2024 01:35
@giordano
Copy link
Member Author

giordano commented Jun 13, 2024

@eschnett could you please have a look at the mpitrampoline errors?

Failed to precompile MPI [da04e1cc-30fd-572f-bb4f-1f8673147195] to "/home/runner/.julia/compiled/v1.10/MPI/jl_FfNYtV".
MPItrampoline: MPI ABI version mismatch:
This version of MPItrampoline requires MPI ABI version 2.10.0, but the loaded MPIwrapper only provides MPI ABI version 2.9.0.
This is MPItrampoline version 5.4.0.
You loaded MPIwrapper version 2.10.3 from file "/usr/local/lib/libmpiwrapper.so"

I presume we need to update something in the CI setup (unrelated to this PR), but the error message looks a bit contradictory, before it says we have mpiwrapper 2.9, and then it says it's 2.10

@eschnett
Copy link
Contributor

The error is

MPItrampoline: MPI ABI version mismatch:
This version of MPItrampoline requires MPI ABI version 2.10.0, but the loaded MPIwrapper only provides MPI ABI version 2.9.0.
This is MPItrampoline version 5.4.0.
You loaded MPIwrapper version 2.10.3 from file "/usr/local/lib/libmpiwrapper.so".

We need to use MPIwrapper 2.11 instead.

I think I got the semver semantics wrong. The recent change to MPItrampoline (supporting oneAPI) was supposed to be backward compatible, hence the minor version change only. Sorry about this!

@giordano giordano force-pushed the mg/mpiexecjl-exitcode branch from f5ade2b to 8629957 Compare June 13, 2024 15:11
@giordano giordano marked this pull request as ready for review June 13, 2024 15:11
@giordano
Copy link
Member Author

giordano commented Jun 13, 2024

Now we get:

% mpiexecjl -np 2 --project=/tmp julia --color=yes -e 'exit(2)'; echo $?                
┌ Error: The MPI process failed
│   proc = Process(setenv(`/home/mose/.julia/artifacts/b7a943fb6a811908b073b8af69d955f16703ca2b/bin/mpiexec -np 2 julia --color=yes -e 'exit(2)'`,[...]), ProcessExited(2))
└ @ Main none:7
2

which, similarly to what we were doing previously, prints to screen the failed process.

@giordano giordano requested a review from vchuravy June 13, 2024 15:13
@eschnett
Copy link
Contributor

@giordano I assume your comment above isn't meant for me any more, and that MPItrampoline is now working correctly.

@giordano
Copy link
Member Author

Yes, I was back to the topic of this PR 🙂

@giordano giordano merged commit 2f88c97 into JuliaParallel:master Jun 14, 2024
50 of 51 checks passed
@giordano giordano deleted the mg/mpiexecjl-exitcode branch June 14, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot return user specified exit status with exit() when using mpiexecjl
4 participants