-
Notifications
You must be signed in to change notification settings - Fork 63
Parallel Computing
Right now MudPy supports parallel computing for Green's function and synthetics calculation only. This is done through mpi4py and it significantly reduces compute times.
The solve step (the actual inversion) is not parallelized yet. I'm working right now on integrating PETSc support to make that step faster but that's a medium term project. Stay tuned.
This step was a bit cumbersome for me. First try the easy way:
pip install mpi4py
If this works you are good to go. For me it didn't, should this be the case for you too then you will need to build from the source. First you need to install openMPI. Once this is done you can build mpi4py, the instructions are here.
On mac OS type the following at the terminal to determine how many CPUs you have available
sysctl -n hw.ncpu
On linux type
nproc --all
This number will be the maximum number of processors you can request computation on. For my laptop this is 8.
If you already have a .inv
parameter file you have been using for computation insert the line ncpus=8
(with however many CPUs your system has) close to the top such that your file looks something like
G_from_file=0 # =0 read GFs and create a new G, =1 load G from file
invert=0 # =1 runs inversion, =0 does nothing
###############################################################################
############### Green function parameters #############
ncpus=8 #Number of available cores (set to ncpus=1 for serial)
hot_start=0 #Start at a certain subfault number
In newer versions of mudpy this argument will be required otherwise an error will be thrown, so set this to at least ncpus=1
for serial computation.
You will also need to modify the function call at the end to include the ncpus
argument like so
# Run green functions
if make_green==1 or make_synthetics==1:
runslip.inversionGFs(home,project_name,GF_list,tgf_file,fault_name,model_name,
dt,tsun_dt,NFFT,tsunNFFT,make_green,make_synthetics,dk,pmin,
pmax,kmax,beta,time_epi,hot_start,ncpus)
This makes things much, much faster, especially for large GF/synthetics runs like InSAR and seafloor deformation points.
Parallelization happens over the number of source points. The large source is broken up into smaller ones and each smaller source is distributed to one CPU. You can see how the source is broken up by looking in /data/model_info/
you should see files labeled like mpi_source.0.fault
,mpi_source.1.fault
, etc.
mpi4py is very robust so if you request GFs/synthetics for many different kinds of data and an error occurs the code will output some error messages and continue onto the next task, this is good, but always check console output to see if everything ran successfully.
The output has changed from what you might be used to. It's untidy to have a bunch of cpus all printing to screen so everything has been streamlined.