-
Notifications
You must be signed in to change notification settings - Fork 15
Pool Allocator
,( ,( ,( ,( ,( ,( ,( ,(
`-' `-' `-' `-' `-' `-' `-' `-' `
_________________________
/ "Don't be a malloc-hater \
| Use the pool alligator!" |
\ _____________________ /
| /
|/ .-._ _ _ _ _ _ _ _ _
.-''-.__.-'00 '-' ' ' ' ' ' ' ' '-.
'.___ ' . .--_'-' '-' '-' _'-' '._
V: V 'vv-' '_ '. .' _..' '.'.
'=.____.=_.--' :_.__.__:_ '. : :
(((____.-' '-. / : :
(((-'\ .' /
_____..' .'
'-._____.-'
,( ,( ,( ,( ,( ,( ,( ,(
`-' `-' `-' `-' `-' `-' `-' `-' `
YAKL has a pool allocator, "Gator", that is automatically turned on and used for all YAKL allocations and deallocations. The reason for the pool is that allocation and free calls on accelerator devices are typically very expensive, and scientific codes often perform allocations and free's very frequently. To facilitate doing the efficiently, a large pool of memory is allocated at YAKL's initialization, and YAKL hands out chunks of the pool during runtime very cheaply.
The thing about a pool allocator is that once your run out of memory in a given pool, you cannot resize the pool. That would invalidate the pointers you've handed out from the initial pool. Rather, you can only add new pools. Therefore, if the arrays you're allocating are "large", and size of individual pools is "small", you may find yourself in situations where no additional pool is large enough to host the size needed for that array. In those cases, YAKL will inform you that your initial pool size is too small.
If you call yakl::init()
with the optional InitConfig()
parameter, you can specify the parameters of YAKL's memory pool during runtime. An example is below:
yakl::init( yakl::InitConfig().set_pool_enabled(true).set_pool_initial_mb(4096) );
Parameters set during yakl::init()
will always override environment variables.
You control the behavior of Gator's pool management through the following environment variables:
-
export GATOR_DISABLE=1
: Inform YAKL to disable the memory pool and call device allocation and deallocation every time.- If the user passes
InitConfig().set_pool_enabled(true)
toyakl_init()
, then this environment variable is overridden, and the pool is still enabled.
- If the user passes
-
export GATOR_INITIAL_MB=[SIZE_IN_MB]
: The initial pool size in MB.- This will be overridden if the passes
InitConfig().set_pool_initial_mb(...)
toyakl::init()
.
- This will be overridden if the passes
-
export GATOR_GROW_MB=[SIZE_IN_MB]
: The size of each new pool in MB once the initial pool is out of memory.- This will be overridden if the passes
InitConfig().set_pool_grow_mb(...)
toyakl::init()
.
- This will be overridden if the passes
-
export GATOR_BLOCK_BYTES=[SIZE_IN_BYTES]
: The increment by which to allocate memory in the pool.- This will be overridden if the passes
InitConfig().set_pool_block_bytes(...)
toyakl::init()
.
- This will be overridden if the passes
YAKL's pool allocator is pretty informative and will try to let you know what to do if an issue occurs. Some features of Gator:
- Fortran bindings for integer, integer(8), real, real(8), and logical
- Fortran bindings for arrays of one to seven dimensions
- Able to call cudaMallocManaged under the hood with prefetching and memset
- Able to support arbitrary lower bounds in the Fortran interface for Fortran pointers
- Simple pool allocator implementation that and automatically grows as needed
- The pool allocator responds to environment variables to control the initial allocation size, and the size of each additional pool as it grows
- Minimal internal fragmentation for any pattern of allocations and frees
- Warns the user if allocations are left allocated after the pool is destroyed
- Thread safe, so feel free to use the pool inside CPU-threaded regions. Gator uses
std::mutex
to lock and unlock, so it is thread safe for pthreads,std::thread
, and OpenMP CPU threads.
The pool search and allocation algorithm is not the fastest, but it is as close to optimal in terms of memory usage and fragmentation as you can get. The cost is typically fine because the cost of allocating data is overlapped with GPU kernel execution in most contexts. Regardless, the cost is still significantly less than most accelerator device calls to malloc
and free
.
The heuristics below give you an option for how to simply manage errors by only changing GATOR_INITIAL_MB
:
- Try
export GATOR_DISABLE=1
to disable the pool allocator, and see if the simulation succeeds. If it succeeds, then we have confidence that there is, at least, room on the device for the memory you are trying to allocate. If it fails, then you are using too much memory, and you should follow one or more of the mitigation strategies in the sub-list below.- If there are other modules in your code using GPU memory, consider deallocating that before running the current module.
- Consider using more nodes to decrease the amount of memory required per GPU.
- Consider executing one of your dimensions in "chunks" if that dimension is trivially parallel in order to reduce the overall memory requirements.
- If the error message says you cannot fit the variable in the current pools or in future pools, then you should increase
GATOR_INITIAL_MB
.-
GATOR_INITIAL_MB
must be larger than the size of the allocation given to you in the error message.
-
- If you've run out of device memory and only one pool has been created, then you should decrease
GATOR_INITIAL_MB
. The initial pool is requesting more memory than the device has available.- In particular, ensure
GATOR_INITIAL_MB
is not larger than the total amount of memory available on the device.
- In particular, ensure
- If you've run out of GPU memory and more than one pool has been created, then you should increase
GATOR_INITIAL_MB
. This creates more room for the variables you could not fit. - If none of this has helped, this means that step 1 only barely succeeded, and you're basically trying to use too much memory. Please follow the mitigation strategies in step 1.
Bisection search: If step 1 succeeded, and if you have one GATOR_INITIAL_MB
value for which you need to increase GATOR_INITIAL_MB
and a higher value for which you need to decrease GATOR_INITIAL_MB
, then try a bisection search strategy to determine a value that works.
How do I know how many pools have been created?: You can tell when pools are created by specifying -DYAKL_VERBOSE_FILE
, which will dump out a verbose file per process recording each internal event inside YAKL as it occurs, including pool creation. grep
for Creating pool of
to determine how many pools have been created for a given MPI task.