Skip to content

Commit

Permalink
FAQ: fix block rendering
Browse files Browse the repository at this point in the history
  • Loading branch information
mloubout committed Oct 4, 2023
1 parent 798d2ff commit 478e343
Showing 1 changed file with 63 additions and 33 deletions.
96 changes: 63 additions & 33 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
## How can I see the code generated by Devito?
After you build an ```op=Operator(...)``` implementing one or more equations, you can use ```print(op)``` to see the generated low level code. The example below builds an operator that takes a 1/2 cell forward shifted derivative of the ```Function``` **f** and puts the result in the ```Function``` **g**.

```
```python
import numpy as np
import devito
from devito import Grid, Function, Eq, Operator
Expand All @@ -54,7 +54,7 @@ print(op)

And the output:

```
```C
#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
Expand Down Expand Up @@ -107,7 +107,8 @@ int Kernel(struct dataobj *restrict f_vec, struct dataobj *restrict g_vec, const
Set the environment variable `DEVITO_LOGGING=DEBUG`. When an Operator gets compiled, the used compilation command will be emitted to stdout.
If nothing seems to change, it is possible that no compilation is happening under-the-hood as all kernels have already been compiled in a previous run. You will then have to clear up the Devito kernel cache. From the Devito root directory, run:
```
```bash
python scripts/clear_devito_cache.py
```

Expand Down Expand Up @@ -151,7 +152,8 @@ Take a look [here](https://github.com/devitocodes/devito/tree/master/examples/pe
Devito applies several performance optimizations to improve the number of operations ("operation count") in complex expressions. These optimizations are designed to do a really good job but also be reasonably fast. One such pass attempts to factorize as many common terms as possible in expressions in order to reduce the operation count. We will construct a demonstrative example below that has a common term that is _not_ factored out by the Devito optimization. The difference in floating-point operations per output point for the factoring of that term is about 10 percent, and the generated C is different, but numerical outputs of running the two different operators are indistinguishable to machine precision. In terms of actual performance, the (few) missed factorization opportunities may not necessarily be a relevant issue: as long as the code is not heavily compute-bound, the runtimes may only be slightly higher than in the optimally-factorized version.

#### Operator 1:
```

```python
ux_update = t.spacing**2 * b * \
((c33 * u_x.dx(x0=x+x.spacing/2)).dx(x0=x-x.spacing/2) +
(c55 * u_x.dz(x0=z+z.spacing/2)).dz(x0=z-z.spacing/2) +
Expand All @@ -162,8 +164,10 @@ stencil_x = Eq(u_x.forward, ux_update)
print("\n", stencil_x)
op = Operator([stencil_x])
```

#### Operator 2:
```

```python
ux_update = \
t.spacing**2 * b * (c33 * u_x.dx(x0=x+x.spacing/2)).dx(x0=x-x.spacing/2) + \
t.spacing**2 * b * (c55 * u_x.dz(x0=z+z.spacing/2)).dz(x0=z-z.spacing/2) + \
Expand All @@ -176,7 +180,8 @@ op = Operator([stencil_x])
```

#### Output 1:
```

```bash
Eq(u_x(t + dt, x, z), dt**2*(Derivative(c13(x, z)*Derivative(u_z(t, x, z), z), x) + Derivative(c33(x, z)*Derivative(u_x(t, x, z), x), x) + Derivative(c55(x, z)*Derivative(u_x(t, x, z), z), z) + Derivative(c55(x, z)*Derivative(u_z(t, x, z), x), z))*b(x, z) + (-dt*wOverQ(x, z) + 2)*u_x(t, x, z) + (dt*wOverQ(x, z) - 1)*u_x(t - dt, x, z))
Operator `Kernel` generated in 1.26 s
* lowering.Expressions: 0.61 s (48.7 %)
Expand All @@ -186,7 +191,8 @@ Flops reduction after symbolic optimization: [1160 --> 136]
```

#### Output 2:
```

```bash
Eq(u_x(t + dt, x, z), dt**2*b(x, z)*Derivative(c13(x, z)*Derivative(u_z(t, x, z), z), x) + dt**2*b(x, z)*Derivative(c33(x, z)*Derivative(u_x(t, x, z), x), x) + dt**2*b(x, z)*Derivative(c55(x, z)*Derivative(u_x(t, x, z), z), z) + dt**2*b(x, z)*Derivative(c55(x, z)*Derivative(u_z(t, x, z), x), z) + (-dt*wOverQ(x, z) + 2)*u_x(t, x, z) + (dt*wOverQ(x, z) - 1)*u_x(t - dt, x, z))
Operator `Kernel` generated in 1.12 s
* lowering.Expressions: 0.59 s (53.0 %)
Expand Down Expand Up @@ -221,7 +227,8 @@ You will note that this method uses placeholders for the material parameter arra

### How to get the list of Devito environment variables
You can get the list of environment variables with the following python code:
```

```python
from devito import print_defaults
print_defaults()
```
Expand Down Expand Up @@ -304,7 +311,8 @@ Set `DEVITO_IGNORE_UNKNOWN_PARAMS=1` to avoid Devito raising an exception if one

## How do you run the unit tests from the command line
In addition to the [tutorials]( https://www.devitoproject.org/devito/tutorials.html), the unit tests provide an excellent way to see how the Devito API works with small self-contained examples. You can exercise individual unit tests with the following python code:
```

```bash
pytest <test.py>
pytest -vs <test.py> [more detailed log]
```
Expand All @@ -315,7 +323,7 @@ pytest -vs <test.py> [more detailed log]
## What is the difference between f() and f[] notation
Devito offers a functional language to express finite difference operators. This is introduced [here](https://github.com/devitocodes/devito/blob/master/examples/userapi/01_dsl.ipynb) and systematically used throughout our examples and tutorials. The language relies on what in jargon we call the "f() notation".

```
```python
>>> from devito import Grid, Function
>>> grid = Grid(shape=(5, 6))
>>> f = Function(name='f', grid=grid, space_order=2)
Expand All @@ -327,15 +335,15 @@ Derivative(f(x, y), x)

Sometimes, one wishes to escape the constraints of the language. Instead of taking derivatives, other special operations are required. Or perhaps, a specific grid point needs to be accessed. In such a case, one could use the "f[] notation" or "indexed notation". Following on from the example above:

```
```python
>>> x, y = grid.dimensions
>>> f[x + 1000, y]
f[x + 1000, y]
```

The indexed object can be used at will to construct `Eq`s, and they can be mixed up with objects stemming from the "f() notation".

```
```python
>>> f.dx + f[x + 1000, y]
Derivative(f(x, y), x) + f[x + 1000, y]
```
Expand Down Expand Up @@ -378,19 +386,23 @@ The indexed notation, or "f[] notation", is discussed [here](#What-is-the-differ

## What's up with object\.data
The `.data` property which is associated with objects such as `Constant`, `Function` and `SparseFunction` (along with their derivatives) represents the 'numerical' value of the 'data' associated with that particular object. For example, a `Constant` will have a single numerical value associated with it as shown in the following snippet
```

```python
from devito import Constant

c = Constant(name='c')
c.data = 2.7

print(c.data)
```
```

```default
2.7
```

Then, a `Function` defined on a `Grid` will have a data value associated with each of the grid points (as shown in the snippet below) and so forth.
```

```python
import numpy as np
from devito import Grid, Function

Expand All @@ -400,7 +412,8 @@ f.data[:] = np.arange(16).reshape(grid.shape)

print(f.data)
```
```

```default
[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]
Expand All @@ -412,27 +425,36 @@ print(f.data)

## How do I create and N-dimensional grid
Grids are often created via, e.g.,
```

```python
grid = Grid(shape=(5, 5))
```

where printing the `grid` object then returns:
```

```default
Grid[extent=(1.0, 1.0), shape=(5, 5), dimensions=(x, y)]
```
Here we see the `grid` has been created with the 'default' dimensions `x` and `y`. If a grid is created and passed a shape of `(5, 5, 5)` we'll see that in addition it has a `z` dimension. However, what if we want to create a grid with, say, a shape of `(5, 5, 5, 5)`? For this case, we've now run out of the dimensions defined by default and hence need to create our own dimensions to achieve this. This can be done via, e.g.,
```

Here we see the `grid` has been created with the 'default' dimensions `x` and `y`. If a grid is created and passed a shape of `(5, 5, 5)` we'll see that in addition it has a `z` dimension. However, what if we want to create a grid with, say, a shape of `(5, 5, 5, 5)`? For this case, we've now run out of the dimensions defined by default and hence need to create our own dimensions to achieve this. This can be done via, e.g.,

```python
a = SpaceDimension('a')
b = SpaceDimension('b')
c = SpaceDimension('c')
d = SpaceDimension('d')
grid = Grid(shape=(5, 5, 5, 5), dimensions=(a, b, c, d))
```

where now, printng `grid` we get
```

```default
Grid[extent=(1.0, 1.0, 1.0, 1.0), shape=(5, 5, 5, 5), dimensions=(a, b, c, d)]
```

and `grid.shape` returns
```

```default
(5, 5, 5, 5)
```

Expand Down Expand Up @@ -463,7 +485,8 @@ Loop fission (to maximize parallelism)
## As time increases in the finite difference evolution, are wavefield arrays "swapped" as you might see in c/c++ code

In c/c++ code using two wavefield arrays for second order acoustics, you might see code like the following to “swap” the wavefield arrays at each time step:
```

```C
float *p_tmp = p_old;
p_old = p_cur;
p_cur = p_tmp;
Expand Down Expand Up @@ -491,7 +514,7 @@ First, classes such as `Function` or `SparseTimeFunction` are inherently complex

Second, you must know that these objects are subjected to so-called reconstruction during compilation. Objects are immutable inside Devito; therefore, even a straightforward symbolic transformation such as `f[x] -> f[y]` boils down to performing a reconstruction, that is, creating a whole new object. Since `f` carries around several attributes (e.g., shape, grid, dimensions), each time Devito performs a reconstruction, we only want to specify which attributes are changing -- not all of them, as it would make the code ugly and incredibly complicated. The solution to this problem is that all the base symbolic types inherit from a common base class called `Reconstructable`; a `Reconstructable` object has two special class attributes, called `__rargs__` and `__rkwargs__`. If a subclass adds a new positional or keyword argument to its `__init_finalize__`, it must also be added to `__rargs__` or `__rkwargs__`, respectively. This will provide Devito with enough information to perform a reconstruction when it's needed during compilation. The following example should clarify:

```
```python
class Foo(Reconstructable):
__rargs__ = ('a', 'b')
__rkwargs__ = ('c',)
Expand All @@ -515,7 +538,7 @@ class Bar(Foo):

You are unlikely to care about how reconstruction works in practice, but here are a few examples for `a = Foo(3, 5)` to give you more context.

```
```python
a._rebuild() -> "x(3, 5, 4)" (i.e., copy of `a`).
a._rebuild(4) -> "x(4, 5, 4)"
a._rebuild(4, 7) -> "x(4, 7, 4)"
Expand All @@ -534,7 +557,7 @@ There is currently no API to achieve this straightforwardly. However, there are
* via env vars: use a [CustomCompiler](https://github.com/opesci/devito/blob/v4.0/devito/compiler.py#L446) -- just leave the `DEVITO_ARCH` environment variable unset or set it to `'custom'`. Then, `export CFLAGS="..."` to tell Devito to use the exported flags in place of the default ones.
* programmatically: subclass one of the compiler classes and set `self.cflags` to whatever you need. Do not forget to add the subclass to the [compiler registry](https://github.com/opesci/devito/blob/v4.0/devito/compiler.py#L472). For example, you could do

```
```python
from devito import configuration, compiler_registry
from devito.compiler import GNUCompiler

Expand Down Expand Up @@ -576,7 +599,8 @@ Until Devito v3.5 included, domain decomposition occurs along the fastest axis.
## How should I use MPI on multi-socket machines

In general you should use one MPI rank per NUMA node on a multi-socket machine. You can find the number of numa nodes with the `lscpu` command. For example, here is the relevant part of the output from the `lscpu` command on an AMD 7502 2 socket machine with 2 NUMA nodes:
```

```default
Architecture: x86_64
CPU(s): 64
On-line CPU(s) list: 0-63
Expand All @@ -597,7 +621,7 @@ NUMA node1 CPU(s): 32-63
There are a few things you may want to check

* To refer to the actual ("global") shape of the domain, you should always use `grid.shape` (or analogously through a `Function`, `f.grid.shape`). And unless you know well what you're doing, you should never use the function shape, namely `f.shape` or `f.data.shape`, as that will return the "local" domain shape, that is the data shape after domain decomposition, which might differ across the various MPI ranks.
* <... to be completed ...>


[top](#Frequently-Asked-Questions)

Expand All @@ -613,17 +637,23 @@ This is likely due to an out-of-bounds (OOB) array access while running the gene
## Can I manually modify the C code generated by Devito and test these modifications

Yes, as of Devito v3.5 it is possible to modify the generated C code and run it inside Devito. First you need to get the C file generated for a given `Operator`. Run your code in `DEBUG` mode:
```

```bash
DEVITO_LOGGING=DEBUG python your_code.py
```

The generated code path will be shown as in the excerpt below:
```

```default
CustomCompiler: compiled `/tmp/devito-jitcache-uid1000/ed41e9373af1bc129471b7ae45e1c3740b60a856.c` [0.29 s]
```

You can now open the C file, do the modifications you like, and save them. Finally, rerun the same program but this time with the _Devito JIT backdoor_ enabled:
```

```bash
DEVITO_JIT_BACKDOOR=1 python your_code.py
```

This will force Devito to recompile and link the modified C code.

If you have a large codebase with many `Operator`s, here's a [trick](https://github.com/devitocodes/devito/wiki/Efficient-use-of-DEVITO_JIT_BACKDOOR-in-large-codes-with-many-Operators) to speed up your hacking with the JIT backdoor.
Expand Down Expand Up @@ -675,7 +705,7 @@ About the GPts/s metric, that is number of gigapoints per seconds. The "points"

An excerpt of the performance profile emitted by Devito upon running an Operator is provided below. In this case, the Operator has two sections, ``section0`` and ``section1``, and ``section1`` consists of two consecutive 6D iteration spaces whose size is given between angle brackets.

```
```default
Global performance: [OI=0.16, 8.00 GFlops/s, 0.04 GPts/s]
Local performance:
* section0<136,136,136> run in 0.10 s [OI=0.16, 0.14 GFlops/s]
Expand All @@ -695,7 +725,7 @@ The floating-point operations are counted once all of the symbolic flop-reducing

To calculate the GFlops/s performance, Devito multiplies the floating-point operations calculated at compile time by the size of the iteration space, and it does that at the granularity of individual expressions. For example, consider the following snippet:

```
```default
<section0 start>
for x = x_m to x_M
for y = y_m to y_M
Expand Down

0 comments on commit 478e343

Please sign in to comment.