Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Datawrangling macros #208

Merged
merged 32 commits into from
Nov 12, 2024
Merged

Conversation

simone-silvestri
Copy link
Collaborator

@simone-silvestri simone-silvestri commented Oct 31, 2024

This PR is a proposal to partially solve #206 (in particular the redundant download part)

It introduces a couple of distributed macros:

@root exp... # only rank 0 executes the expression

@distribute for iter in iterable # the for loop is distributed among ranks
        body 
end

@onrank rank, exp... # performs the expression on rank $rank, the expression is run anyways if MPI is not initialized (discarding the rank info) 

@handshake exp... # performs the expression sequentially in each rank following a handshake procedure (rank `r2 > r1` waits for rank `r1` to finish)

We would have to find a way to test this since distributed tests are not in place yet.
A cleaner PR that substitutes #207

@simone-silvestri simone-silvestri changed the title cleaner change Distributed Datawrangling macros Oct 31, 2024
@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Oct 31, 2024

This code

using MPI
MPI.Init()
using ClimaOcean

@root begin
    println("Hello from rank 0")
end
@onrank 1, println("Another Hello from rank 1")
@onrank 2, println("Yet another one from rank 2")
@root println()
@root println()

rank = ClimaOcean.mpi_rank()
a1 = []
a2 = []

for i in 1:10
    push!(a1, i)
end

@distribute for i in 1:10
    push!(a2, i)
end

@show rank a1 a2
@root println()
@root println()

@handshake @show rank a1 a2

ran with mpiexecjl -np 3 julia --project test.jl
returns:

(base) simonesilvestri@Simones-MBP ClimaOcean.jl % ~/.julia/bin/mpiexecjl -np 3 julia --project test.jl
Hello from rank 0
Another Hello from rank 1
Yet another one from rank 2


rank = rank = rank = 120


a1 = Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a2 = Any[1, 4, 7, 10]
a1 = Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]a1 = Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a2 =
Any[3, 6, 9]a2 =
Any[2, 5, 8]


rank = 0
a1 = Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a2 = Any[1, 4, 7, 10]
rank = 1
a1 = Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a2 = Any[2, 5, 8]
rank = 2
a1 = Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a2 = Any[3, 6, 9]
(base) simonesilvestri@Simones-MBP ClimaOcean.jl %

I am wondering if maybe these functions could be more of an Oceananigans utility rather than a ClimaOcean one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is epic.

A few comments:

  • Should this be in Oceananigans
  • Should everything take comm as an argument, which default to COMM_WORLD?
  • What happens if MPI is initialized but we are not distributed? I could see people leaving MPI.Init() in their scripts, lazily...
  • Can the macros also take comm as an argument, which defaults to COMM_WORLD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you could even contribute this to MPI.jl and then we use it in either oceananigans or ClimaOcean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the latter case, we should just merge it here and get used to using it, and then move it in the future

Copy link
Collaborator Author

@simone-silvestri simone-silvestri Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I also think this might go in Oceananigans. We probably have some use for it there.
I am thinking mostly of progress functions where we don't want all the ranks to spit out redundant information.

Regarding MPI.jl, I don't know if it would be needed there. Maybe, @simonbyrne, would you need macros like these in MPI.jl?

If not distributed but mpi is initialized then we are running on rank 0, so everything will work as if we are running a serial computation without having MPI initialized. The only difference is an hypothetical @onrank rank, exps..., with rank > 0, which executes if mpi is not initialized but will not execute anything if mpi is initialized with only one rank.
There are probably two different ways to reconcile this difference

  • make @onrank rank, exp... execute always if MPI.Comm_size() == 1 which allows serial computation but then the function is not really true to its name
  • enforce that @onrank rank, exp... executes only on rank so only @onrank 0, exps... (which is equivalent to @root) executes something in a serial computation.

I'll try to figure out if I can pass communicators easily

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding MPI.jl, I don't know if it would be needed there.

What do you mean by "needed"? These macros are completely independent of ocean modeling right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simone-silvestri I suggest opening an issue on the MPI github to understand whether it is in scope for MPI.jl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about MPI.jl, but you could also put them in ClimaComms.jl?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we could migrate them there later on when we do the switch to ClimaComms.jl

@glwagner
Copy link
Member

glwagner commented Nov 5, 2024

I support merging this is as is and we can work out the kinks as we go

@simone-silvestri
Copy link
Collaborator Author

sounds good!

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 0% with 25 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (ce5b559) to head (b09348a).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/Bathymetry.jl 0.00% 7 Missing ⚠️
src/DataWrangling/ECCO/ECCO.jl 0.00% 6 Missing ⚠️
src/DataWrangling/ECCO/ECCO_metadata.jl 0.00% 4 Missing ⚠️
src/distributed_utils.jl 0.00% 4 Missing ⚠️
src/DataWrangling/JRA55.jl 0.00% 3 Missing ⚠️
src/DataWrangling/ECCO/ECCO_restoring.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main    #208   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files         32      33    +1     
  Lines       1841    1828   -13     
=====================================
+ Misses      1841    1828   -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@simone-silvestri
Copy link
Collaborator Author

Ready to go

@glwagner
Copy link
Member

glwagner commented Nov 5, 2024

Is there any way to add a few tests?

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Nov 5, 2024

Hmmm, we should add some MPI tests. Do we want to do it here? I can add some, otherwise we can move this to Oceananigans where probable we need @root only just to not spit out @info from all cores and we have distributed tests.

@glwagner
Copy link
Member

glwagner commented Nov 6, 2024

Hmmm, we should add some MPI tests. Do we want to do it here? I can add some, otherwise we can move this to Oceananigans where probable we need @root only just to not spit out @info from all cores and we have distributed tests.

I think we should really try to contribute this to MPI.jl. If MPI.jl doesn't want it, we may want to contribute it to some other small utility package. So I wouldn't put any effort into moving it to Oceananigans if we are just going to move it again.

The tests should be portable, so I think its ok to put tests in here.

@simone-silvestri
Copy link
Collaborator Author

This should be ready to merge

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@simone-silvestri simone-silvestri merged commit 80e70e1 into main Nov 12, 2024
7 checks passed
@simone-silvestri simone-silvestri deleted the ss/distributed-datawrangling branch November 12, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants