Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run CI on aarch64 with Drone #472

Merged
merged 1 commit into from
Jun 5, 2021

Conversation

giordano
Copy link
Member

@giordano giordano commented Jun 3, 2021

With CPUs like A64FX, aarch64 is an interesting platform for HPC. I think
it'd be good to tests MPI.jl works here. I just ran the tests on Isambard 2
and they are successful.

However, I have a small application which I tried to run on a MacBook M1
(aarch64-apple-darwin) and it crashes at
https://github.com/JuliaLang/julia/blob/58ffe7e3ed3a93a9d816097548e785284f57fbd4/src/codegen.cpp#L5531-L5536
in a call to MPI.Reduce!. Tests are successful also here, so perhaps they
don't capture this error (or the error lies somewhere else), but I'll need to
produce a minimal working example to get a useful test out of it.

If you want to go on with this one, someone will need to enable this repository
at https://cloud.drone.io/.


Edit: regarding the error I got, I'm using a custom reduction, Valentin pointed out this is known to not work at the moment on non-Intel platforms and tests about them are skipped.

@simonbyrne
Copy link
Member

What was the code doing? It sounds like you were using a custom reduction operator: these aren't supported on ARM or PPC (we currently disable tests on those platforms, see #353).

#404 had some ideas to workaround this.

@giordano
Copy link
Member Author

giordano commented Jun 5, 2021

Yes, as I wrote in the "Edit", I was using a custom reduction

@simonbyrne simonbyrne closed this Jun 5, 2021
@simonbyrne simonbyrne reopened this Jun 5, 2021
@giordano giordano force-pushed the mg/drone-aarch64 branch from 98326cc to 2e389bc Compare June 5, 2021 20:48
@giordano
Copy link
Member Author

giordano commented Jun 5, 2021

Julia v1.6 on aarch64 is failing with

Test Failed at /drone/src/test/test_io_shared.jl:22
  Expression: MPI.File.get_position_shared(fh) == 0
   Evaluated: 9 == 0
ERROR: LoadError: There was an error during testing
in expression starting at /drone/src/test/test_io_shared.jl:22

Interestingly enough, Julia v1.3 is successful 😕

@simonbyrne
Copy link
Member

Yeah, that happens intermittently. It's unclear what the consistency semantics are for shared file pointers.

@simonbyrne simonbyrne merged commit bf2a58e into JuliaParallel:master Jun 5, 2021
@simonbyrne
Copy link
Member

Thanks!

@giordano giordano deleted the mg/drone-aarch64 branch June 5, 2021 21:48
@giordano
Copy link
Member Author

giordano commented Jun 5, 2021

BTW, I forgot to mention that tests are successful on aarch64-apple-darwin with ENV["JULIA_MPI_BINARY"]="OpenMPI_jll" (Edit: I actually mentioned tests are successful already 😅 not that I used OpenMPI_jll)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants