-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI workarounds for aarch64 #999
Conversation
Thanks. Two remarks:
|
I moved the fixes to the I did not explicitly report these fixes upstream, but my mention of the original issue in my initial message ensures that this PR appears there for reference. |
src/workarounds/aarch64_mpi.jl
Outdated
|
||
# Vec3{T} must be cast to Vector{T} before MPI reduction | ||
function mpi_sum!(arr::Vector{Vec3{T}}, comm::MPI.Comm) where{T} | ||
n = length(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be solved by a reinterpret(reshape, ... )? Then there is no copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is tricky to get without copies, because SVector
are immutable. So even if I use something like new_aarr = reshape(reinterpret(T, arr), 3, :)
, I still could not call mpi_sum!()
. And assuming I call mpi_sum()
instead, I also end up with a copy.
src/workarounds/aarch64_mpi.jl
Outdated
# utility function to cast a Dual type to an array containing a value and the partial diffs | ||
function dual_array(dual::ForwardDiff.Dual{T, U, V}) where{T, U, V} | ||
dual_array = [ForwardDiff.value(dual)] | ||
append!(dual_array, collect(ForwardDiff.partials(dual))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely recall there are again ways to do thiy without copy. Check the forwarddiff rules for array operations (multiplication neefs to be done equally to all partials)
src/workarounds/aarch64_mpi.jl
Outdated
end | ||
|
||
# utility function that casts back an array to a Dual type, based on a template Dual | ||
function new_dual(dual_array, template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again Forwarddifc may have sth here
src/workarounds/aarch64_mpi.jl
Outdated
mpi_sum!(array, comm) | ||
offset = 0 | ||
for i in 1:length(dual) | ||
dual[i] = new_dual(array[offset+1:offset+lengths[i]], dual[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index operation does a copy by default. Use views (@views)
In the last commit, the amount of copies and/or temporary arrays in the MPI workarounds for |
@abussy There seems to be some update in the issue on the MPI.jl side as well. Maybe that helps. |
Indeed, since MPI.jl v0.20.22, it is possible to manually register custom operations for MPI on |
Awesome ! Thanks @abussy lets merge this. |
This PR contains MPI workarounds for
aarch64
. As a result, DFTK can now be run on ARM architecture with full MPI support. These workarounds are meant to be temporary, until there is an upstream fix (whenever that happens, issue JuliaParallel/MPI.jl#404 was already opened 4 years ago).The procedure was the following:
:mpi
tagged tests were run on an ARM machine, until a crash would occur. A set of specific MPI reduction methods were then written for the custom type involved. Some other custom types might pop up in the future. Adding a new method for them should be trivial.I know this is rather clunky, and it inflates the code size and complexity. However, I think it is crucial to fully support DFTK on ARM architecture, especially now that many modern HPC clusters run on it (e.g. Alps at CSCS).