Releases · SC-SGS/CPPuddle

08 Oct 03:21

G-071

v0.3.1

c1ae269

Release 0.3.1 Latest

Latest

Description

This is mostly a bugfix release:

Fixed executor reference counting in work aggregation areas. This enables CPU/GPU load balancing again (mostly useful in consumer-grade hardware).
Fixed aggregation mutex choice (should be hpx::mutex). Use hpx::mutex by default everywhere else now as well (though std::mutex remains a valid option here).
Added an option to turn off executor pools whilst still providing the same interface (useful for performance comparisons).

What's Changed

Update README.md by @G-071 in #23
Fix combined CPU GPU execution by @G-071 in #24
Add option to disable using the executor pool by @G-071 in #25
Change mutex defaults by @G-071 in #26

Full Changelog: v0.3.0...v0.3.1

Contributors

G-071

Assets 2

25 Aug 18:20

G-071

v0.3.0

377ee35

Release 0.3.0

Description

This release contains a refactored/overhauled buffer management core and adds proper MultiGPU support.

Feature list / Changelog:

CPPuddle is now usable as a header-only library.
Reworked buffer manager by adding an HPX-aware mode and variable internal buckets. This mode uses the OS thread ID as a hint to reduce locking and get buffers for the correct NUMA node.
Added cmake variable to steer the number of internally used buckets (tradeoff between speed and memory usage).
Repaired and added MultiGPU functionality (also works for the work aggregation executors / allocators).
Removed central reference counting for recycled Kokkos buffer (now per View counting).
Added proper finalize method which prevents further usage after being called.
Added cmake toggles to enable/disable content recycling and buffer recycling as required (useful for benchmarking).
Made the internal CPPuddle allocation/recycling counters compatible with HPX performance counters.
Contains various bug fixes and a cleaned up codebase.

Note: The MultiGPU addition required some slight adjustment to the interface, requiring additional device_id parameters for various functions. Additionally, some gpu_id parameters from the defunct previous MultiGPU code have been removed. Other than that, the interface largely stayed the same.

What's Changed

Refactor buffer manager by @G-071 in #21
Add MultiGPU Support by @G-071 in #22

Full Changelog: v0.2.0...v0.3.0

Contributors

G-071

Assets 2

24 Aug 21:12

G-071

v0.2.1

fa8a358

Release 0.2.1

Description

This release backports the interface changes from v0.3.0 to the older v0.2.0 release.

Feature list / Changelog:

Backports the interface changes from Release v0.3.0 to v0.2.0, effectively allowing applications such as Octo-Tiger to still use the old CPPuddle core (from 0.2.0) despite having been ported to the new CPPuddle interface (from 0.3.0).
Notably, the interface was backported for v0.2.1 in a way that keeps this release compatible with the interface of previous CPPuddle releases (which was not feasible for 0.3.0 due to the removal of the old MultiGPU code).
The release further fixes some small test issues

Full Changelog: v0.2.0...v0.2.1

Assets 2

18 Aug 20:46

G-071

v0.2.0

c922bcc

Release 0.2.0

Description

This release adds work aggregation/kernel fusion features, SYCL support and A64FX support:

Added explicit work aggregation executors and allocators. These allow multithreaded work aggregation / kernel fusion of GPU kernels when using HPX. They are intended to combine GPU kernels on-the-fly that are doing the same work but on different HPX components (same HPX locality though). See here more detailed description and benchmarks with a real-world HPX application using both an NVIDIA A100 and an AMD MI100 (using CUDA, HIP and Kokkos)
Added basic tests for CPU work aggregation executor/allocators
Added more detailed CPU/GPU STREAM tests for work aggregation executor/allocators
Added SYCL allocators (used for the benchmarks here)
Fixed various CI bugs and compilation on A64Fx (see here for usage example on A64Fx machines)
Note: Including the work aggregation executor/allocators requires C++17, other features still work with C++14.

Pull requests

Work aggregation experimental by @G-071 in #12
Remove superfluous cuda header by @G-071 in #13
Allow arbitrary de-allocation ordering in aggregation areas by @G-071 in #14
Add sycl allocators by @G-071 in #15
Remove flag requirement by @G-071 in #18
Fix jenkins by @G-071 in #19
Added view type by @G-071 in #20
Fix compilation error on Ookami by @JiakunYan in #17

New Contributors

@JiakunYan made their first contribution in #17

Full Changelog: v0.1.0...v0.2.0

Contributors

G-071 and JiakunYan

Assets 2

23 Feb 19:10

G-071

v0.1.0

1b04b54

Release 0.1.0

The version for this release has been in use for multiple months now and seems to work well, hence this initial release with the basic functionality before more experimental features are added!

The release contains the basic (multithreaded) recycling / reusage functionality for buffers and executors:

It provides allocators that enable

Reusage of buffers allocated by std::allocator
Reusage of aligned buffers
Reusage of CUDA device memory buffers
Reusage of CUDA pinned host memory buffer
Reusage of HIP device memory buffers
Reusage of HIP pinned host memory buffer
Reusage of Kokkos Views (via wrapper class)

It further provides executors pools for arbitrary executor with various scheduling policies (tested with HPX CUDA/HIP and Kokkos executors)

Round robin scheduling policy
Priority scheduling policy
MultiGPU with Round Robin scheduling policy
MultiGPU with Priority scheduling policy

The release also includes CI functionality on GitHub actions and Jenkins (for GPU and concurrency tests).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

What's Changed

Contributors

Description

What's Changed

Contributors

Description

Description

Pull requests

New Contributors

Contributors

Releases: SC-SGS/CPPuddle

Release 0.3.1

Description

What's Changed

Contributors

Release 0.3.0

Description

What's Changed

Contributors

Release 0.2.1

Description

Release 0.2.0

Description

Pull requests

New Contributors

Contributors

Release 0.1.0