Skip to content

Asynchronicity in YAKL

Matt Norman edited this page Dec 11, 2021 · 1 revision

Most parallel dispatch and data copying routines in YAKL are asynchronous with respect to host code. They are all launched on the same default device "stream" (however that concept is mapped to different backends). Dependency between kernel launches and data movement is respected on the device even though it is asynchronous with respect to the host.

Calls that are asynchronous with respect to the host:

  • parallel_for
  • Array::deep_copy_to
  • Array::slice

Calls that are synchronized with respect to the host at the end:

  • Array::createHostCopy
  • Array::createDeviceCopy
  • All yakl::intrinsics:: reductions (e.g., sum, minval, maxval, count, etc.)

To synchronize the host with respect to work on the device, please use yakl::fence(), which will stall the host until all device work has been completed.

Clone this wiki locally