diff --git a/FAQ.md b/FAQ.md index f2c4a842140..9ed60189de4 100644 --- a/FAQ.md +++ b/FAQ.md @@ -23,7 +23,7 @@ - [Can I subclass basic types such as TimeFunction](#can-i-subclass-basic-types-such-as-timefunction) - [How can I change the compilation flags (for example, I want to change the optimization level from -O3 to -O0)](#how-can-i-change-the-compilation-flags-for-example-i-want-to-change-the-optimization-level-from--o3-to--o0) - [Is the jitted code IEEE-compliant](#is-the-jitted-code-ieee-compliant) -- [Can I control the MPI domain decomposition](#can-i-control-the-mpi-domain-decomposition) +- [Can I control the MPI domain decomposition?](#can-i-control-the-mpi-domain-decomposition) - [How should I use MPI on multi-socket machines](#how-should-I-use-MPI-on-multi-socket-machines) - [How do I make sure my code is "MPI safe"](#how-do-i-make-sure-my-code-is-MPI-safe) - [Why does my Operator kernel die suddenly](#why-does-my-operator-kernel-die-suddenly) @@ -596,7 +596,7 @@ By default, Devito compiles the generated code using flags that maximize the run [top](#Frequently-Asked-Questions) -## Can I control the MPI domain decomposition +## Can I control the MPI domain decomposition? Until Devito v3.5 included, domain decomposition occurs along the fastest axis. As of later versions, domain decomposition occurs along the slowest axis, for performance reasons. And yes, it is possible to control the domain decomposition in user code, but this is not neatly documented. Take a look at `class CustomTopology` in [distributed.py](https://github.com/devitocodes/devito/blob/master/devito/mpi/distributed.py) and `test_custom_topology` in [this file](https://github.com/devitocodes/devito/blob/master/tests/test_mpi.py). In essence, `Grid` accepts the optional argument `topology`, which allows the user to pass a custom topology as an n-tuple, where `n` is the number of distributed dimensions. For example, for a two-dimensional grid, the topology `(4, 1)` will decompose the slowest axis into four partitions, one partition per MPI rank, while the fastest axis will be replicated over all MPI ranks.