This package is designed to make it possible to analyze very large MITgcm grids in parallel.
These are the main ideas guiding the development.
- The domain should be partitioned into manageable tiles; at no point should the whole domain be loaded into the memory of a single process (like the GCM execution itself).
- But unlike the GCM execution, most analysis tasks are embarassingly parallel; communication between tiles is not required.
- This means that our analysis tasks can be implemented using a MapReduce programming model.
The basic framework is the powerful NumPy/SciPy stack. In particular, NumPy's memmap class allows us access small segments of large files on disk, without reading the entire file into memory. This is exactly what we need on each tile.
The parallelization is handled through the IPython Parallel framework. This extremely flexible architecture makes it trivial to distribute execution in a wide range of environments, including MPI.
One of the biggest barriers against adopting python for scientific computing (over Matlab) is the expectation that it will be difficult to install. Forunately this barrier has been essentially eliminated by the recent emergence of completely pre-cooked NumPy/SciPy environments that are free for academic use. The two I have tested are: