Skip to content

First touch policy for NUMA aware memory placement

Bei Wang edited this page Dec 6, 2019 · 19 revisions

Non-uniform memory access (NUMA) is a kind of memory architecture that allows a processor to have much faster access to to local memory than non-local memory. You can use one of the options on a target machine to check the system topology:

  • Intel MPI's cpuinfo tool (E.g., module load intelmpi, cpuinfo)
  • hwlocs's hwloc-ls tool
  • numactl -H

For memory bounded multithreading application, it is particularly important that memory is placed locally to the processor that accesses the data for optimal performance. There are multiple ways to control memory placement explicitly including first touch policy and numactl tool. Here we rely on first touch policy. This policy is the default on Linux and other operation systems. Specifically, since the first touch placement policy allocates the data page in the memory closest to the thread accessing this page for the first time, optimal memory placement can be ensured by initializing memory with OpenMP loop the same way as we access the memory in the hot spot kernels. In this code, we turn on this option with -DNUMA_FT in the compiler flag. You can refer Nested parallelism in OpenMP for performance comparison with and w/o NUMA aware memory placement.

It is worth to mentioning that the right thread binding strategy (as also described in section Nested parallelism in OpenMP) is also required for NUMA-aware memory placement.