Skip to content

Utility library to handle small, reusable pools of both device memory buffers (via allocators) and device executors (with multiple scheduling policies).

License

Notifications You must be signed in to change notification settings

SC-SGS/CPPuddle

Repository files navigation

CPPuddle

ctest Build Status

Purpose

This repository was initially created to explore how to best use HPX and Kokkos together! For fine-grained GPU tasks, we needed a way to avoid excessive allocations of one-usage GPU buffers (as allocations block the device for all streams) and creation/deletion of GPU executors (as those are usually tied to a stream which is expensive to create as well).

We currently test/use CPPuddle in Octo-Tiger, together with HPX-Kokkos. In this use-case, allocating GPU buffers for all sub-grids in advance would have wasted a lot of memory. On the other hand, unified memory would have caused unnecessary GPU to CPU page migrations (as the old input data gets overwritten anyway). Allocating buffers on-the-fly would have blocked the device. Hence, we currently test this buffer management solution!

Tools provided by this repository

  • Allocators that reuse previousely allocated buffers if available (works with normal heap memory, pinned memory, aligned memory, CUDA/HIP device memory, and Kokkos Views). Note that separate buffers do not coexist on a single chunk of continuous memory, but use different allocations.
  • Executor pools and various scheduling policies (round robin, priority queue, multi-gpu), which rely on reference counting to gauge the current load of a executor instead of querying the device itself. Tested with CUDA, HIP and Kokkos executors provided by HPX / HPX-Kokkos.
  • Special Executors/Allocators for on-the-fly work GPU aggregation (using HPX).

The documentation of the current master branch is available here. In particular, the public functionality for the memory recycling in available in the namespace memory_recycling, for the executor pools it is available in the namespace executor_recycling and the work aggregation (kernel fusion) functionality is available in the namespace work_aggregation.

Requirements

  • C++17
  • CMake (>= 3.16)
  • Optional (for the header-only utilities / test): CUDA, Boost, HPX, Kokkos, HPX-Kokkos

The submodules can be used to obtain the optional dependencies which are required for testing the header-only utilities. If these tests are not required, the submodule (and the respective buildscripts in /scripts) can be ignored safely.

Build / Install

  cmake -H/path/to/source -B$/path/to/build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/install/cppuddle -DCPPUDDLE_WITH_TESTS=OFF -DCPPUDDLE_WITH_COUNTERS=OFF                                                             
  cmake --build /path/to/build --target install  

If installed correctly, CPPuddle can be used in other CMake-based projects via

find_package(CPPuddle REQUIRED)
  • Recommended CMake build:
  cmake -H/path/to/source -B$/path/to/build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/install/cppuddle -DCPPUDDLE_WITH_HPX=ON -DCPPUDDLE_WITH_HPX_AWARE_ALLOCATORS=ON -DCPPUDDLE_WITH_TESTS=OFF -DCPPUDDLE_WITH_COUNTERS=OFF                                                             

About

Utility library to handle small, reusable pools of both device memory buffers (via allocators) and device executors (with multiple scheduling policies).

Resources

License

Stars

Watchers

Forks

Packages

No packages published