Lesson_Materials/Lecture_08_Using_USM/index.html

﻿<!DOCTYPE html>

<html>
	<head>
	    <meta charset="utf-8">
		<link rel="stylesheet" href="../common-revealjs/css/reveal.css">
		<link rel="stylesheet" href="../common-revealjs/css/theme/white.css">
		<link rel="stylesheet" href="../common-revealjs/css/custom.css">
		<script>
			// This is needed when printing the slides to pdf
			var link = document.createElement( 'link' );
			link.rel = 'stylesheet';
			link.type = 'text/css';
			link.href = window.location.search.match( /print-pdf/gi ) ? '../common-revealjs/css/print/pdf.css' : '../common-revealjs/css/print/paper.css';
			document.getElementsByTagName( 'head' )[0].appendChild( link );
		</script>
		<script>
		    // This is used to display the static images on each slide,
			// See global-images in this html file and custom.css
			(function() {
				if(window.addEventListener) {
					window.addEventListener('load', () => {
						let slides = document.getElementsByClassName("slide-background");

						if (slides.length === 0) {
							slides = document.getElementsByClassName("pdf-page")
						}

						// Insert global images on each slide
						for(let i = 0, max = slides.length; i < max; i++) {
							let cln = document.getElementById("global-images").cloneNode(true);
							cln.removeAttribute("id");
							slides[i].appendChild(cln);
						}

						// Remove top level global images
						let elem = document.getElementById("global-images");
						elem.parentElement.removeChild(elem);
					}, false);
				}
			})();
		</script>
		
	</head>
	<body>
		<div class="reveal">
			<div class="slides">
				<div id="global-images" class="global-images">
					<img src="../common-revealjs/images/sycl_academy.png" />
					<img src="../common-revealjs/images/sycl_logo.png" />
					<img src="../common-revealjs/images/trademarks.png" />
				</div>
				<!--Slide 1-->
				<section class="hbox">
					<div class="hbox" data-markdown>
						## Using USM
					</div>
				</section>
				<!--Slide 2-->
				<section class="hbox" data-markdown>
					## Learning Objectives
					* Learn how to allocate memory using USM
					* Learn how to copy data to and from USM allocated memory
					* Learn how to access data from USM allocated memory in a kernel function
					* Learn how to free USM memory allocations
                </section>
				<!--Slide 3-->
				<section>
					<div class="hbox" data-markdown>
						#### Focus on Explicit USM
					</div>
					<div class="container" data-markdown>
						* Remember that there are different variants of USM; explicit, restricted, concurrent and system.
						* Remember also that there are different ways USM memory can be allocated; host, device and shared.
						* We're going to focus explicit USM and device allocations - this is the minimum required variant.
					</div>
				</section>
				<!--Slide 3A-->
				<section>
					<div class="hbox" data-markdown>
						#### USM Allocation Types
					</div>
					<div class="container"data-markdown>
					  ![SYCL](./Figure6-1bookUSMtypes.png "SYCL")
					  (from book)
					</div>
				</section>
				<!--Slide 4-->
				<section>
					<div class="hbox" data-markdown>
						#### Malloc_device
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
void* malloc_device(size_t numBytes, const queue& syclQueue, const property_list &propList = {});

template &lt;typename T&gt;
T* malloc_device(size_t count, const queue& syclQueue,  const property_list &propList = {});
						</code></pre>
					</div>
					<div class="container" data-markdown>
						* A USM device allocation is performed by calling one of the `malloc_device` functions.
						* Both of these functions allocate the specified region of memory on the `device` associated with the specified `queue`.
						* The pointer returned is only accessible in a kernel function running on that `device`.
						* Synchronous exception if the device does not have aspect::usm_device_allocations
						* This is a blocking operation.
					</div>
				</section>
				<!--Slide 5-->
				<section>
					<div class="hbox" data-markdown>
						#### Free
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
void free(void* ptr, queue& syclQueue);
						</code></pre>
					</div>
					<div class="container" data-markdown>
						* In order to prevent memory leaks USM device allocations must be free by calling the `free` function.
						* The `queue` must be the same as was used to allocate the memory.
						* This is a blocking operation.
					</div>
				</section>
				<!--Slide 6-->
				<section>
					<div class="hbox" data-markdown>
						#### Memcpy
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
event queue::memcpy(void* dest, const void* src, size_t numBytes, const std::vector<event> &depEvents);
						</code></pre>
					</div>
					<div class="container" data-markdown>
						* Data can be copied to and from a USM device allocation by calling the `queue`'s `memcpy` member function.
						* The source and destination can be either a host application pointer or a USM device allocation.
						* This is an asynchronous operation enqueued to the `queue`.
						* An `event` is returned which can be used to synchronize with the completion of copy operation.
						* May depend on other events via `depEvents`
					</div>
				</section>
				<!--Slide 7-->
				<section>
					<div class="hbox" data-markdown>
						#### Memset & fill
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
event queue::memset(void* ptr, int value, size_t numBytes, const std::vector<event> &depEvents);

event queue::fill(void* ptr, const T& pattern, size_t count, const std::vector<event> &depEvents);
						</code></pre>
					</div>
					<div class="container" data-markdown>
						* The additional `queue` member functions `memset` and `fill` provide operations for initializing the data of a USM device allocation.
						* The member function `memset` initializes each byte of the data with the value interpreted as an unsigned char.
						* The member function `fill` initializes the data with a recurring pattern.
						* These are also asynchronous operations.
					</div>
				</section>
				<!--Slide 8-->
				<section>
					<div class="hbox" data-markdown>
						#### Putting it all together
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){
	
  auto myQueue = sycl::queue{};

  myQueue.submit([&](handler &cgh){
    cgh.single_task&lt;square_number&gt;([=](){
      /* square some number */
    });
  }).wait();

  return x;
}
						</code></pre>
					</div>
					<div class="container" data-markdown>
						We start with a basic SYCL application which invokes a kernel function with `single_task`.
					</div>
				</section>
				<!--Slide 9-->
				<section>
					<div class="hbox" data-markdown>
						#### Putting it all together
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){
	
  auto myQueue = sycl::queue{<mark>usm_selector{}</mark>};

  myQueue.submit([&](handler &cgh){
    cgh.single_task&lt;square_number&gt;([=](){
      /* square some number */
    });
  }).wait();

  return x;
}
						</code></pre>
					</div>
					<div class="container" data-markdown>
						We initialize the `queue` with the `usm_selector` we wrote in the last exercise, which will choose a device which supports USM device allocations.
					</div>
				</section>
				<!--Slide 10-->
				<section>
					<div class="hbox" data-markdown>
						#### Putting it all together
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){
	
  auto myQueue = sycl::queue{usm_selector{}};

  <mark>auto devicePtr = malloc_device&lt;int&gt;(1, myQueue);</mark>

  myQueue.submit([&](handler &cgh){
    cgh.single_task&lt;square_number&gt;([=](){
      /* square some number */
    });
  }).wait();

  return x;
}
						</code></pre>
					</div>
					<div class="container" data-markdown>
						We allocate USM device memory by calling `malloc_device`.
						Here we use the template variant and specify type `int`.
					</div>
				</section>
				<!--Slide 11-->
				<section>
					<div class="hbox" data-markdown>
						#### Putting it all together
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){

  auto myQueue = sycl::queue{usm_selector{}};

  auto devicePtr = malloc_device&lt;int&gt;(1, myQueue);

  <mark>myQueue.memcpy(devicePtr, &x, sizeof(int)).wait();</mark>

  myQueue.submit([&](handler &cgh){
    cgh.single_task&lt;square&gt;([=](){
      /* square some number */
    });
  }).wait();

  return x;
}
						</code></pre>
					</div>
						<div class="container" data-markdown>
							We copy the value of `x` in the host application to the USM device memory by calling `memcpy` on `myQueue`.
							We immediately call `wait` on the returned `event` to synchronize with the completion of the copy operation.
						</div>
				</section>
				<!--Slide 12-->
				<section>
					<div class="hbox" data-markdown>
						#### Putting it all together
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){

  auto myQueue = sycl::queue{usm_selector{}};

  auto devicePtr = malloc_device&lt;int&gt;(1, myQueue);

  myQueue.memcpy(devicePtr, &x, sizeof(int)).wait();

  myQueue.submit([&](handler &cgh){
    cgh.single_task&lt;square&gt;([=](){
      <mark>*devicePtr = (*devicePtr) * (*devicePtr);</mark>
    });
  }).wait();

  return x;
}
						</code></pre>
					</div>
						<div class="container" data-markdown>
							We then pass the `devicePtr` directly to the kernel function and access it can then be deferenced and the data written to.
						</div>
				</section>
				<!--Slide 13-->
				<section>
					<div class="hbox" data-markdown>
						#### Putting it all together
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){

  auto myQueue = sycl::queue{usm_selector{}};

  auto devicePtr = malloc_device&lt;int&gt;(1, myQueue);

  myQueue.memcpy(devicePtr, &x, sizeof(int)).wait();

  myQueue.submit([&](handler &cgh){
    cgh.single_task&lt;square&gt;([=](){
      *devicePtr = (*devicePtr) * (*devicePtr);
    });
  }).wait();

  <mark>myQueue.memcpy(&x, devicePtr, sizeof(int)).wait();</mark>

  return x;
}
						</code></pre>
					</div>
						<div class="container" data-markdown>
							Finally we copy the result from USM device memory back to the variable x in the host application by calling `memcpy` on `myQueue`.
						</div>
				</section>
				<!--Slide 14-->
				<section>
					<div class="hbox" data-markdown>
						#### Queue shortcuts
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
template &lt;typename KernelName, typename KernelType&gt;
event queue::single_task(const KernelType &KernelFunc);

template &lt;typename KernelName, typename KernelType, int Dims&gt;
event queue::parallel_for(range<Dims> GlobalRange, const KernelType &KernelFunc);
						</code></pre>
					</div>
					<div class="container" data-markdown>
						* The `queue` provides shortcut member functions which allow you to invoke a `single_task` or a `parallel_for` without defining a command group.
						* These can only be used when using the USM data management model.
					</div>
				</section>
				<!--Slide 15-->
				<section>
					<div class="hbox" data-markdown>
						#### With the queue shortcut
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){

  auto myQueue = sycl::queue{usm_selector{}};

  auto devicePtr = malloc_device&lt;int&gt;(1, myQueue);

  myQueue.memcpy(devicePtr, &x, sizeof(int)).wait();

  <mark>myQueue.single_task&lt;square&gt;([=](){</mark>
    <mark>*devicePtr = (*devicePtr) * (*devicePtr);</mark>
  <mark>}).wait();</mark>

  myQueue.memcpy(&x, devicePtr, sizeof(int)).wait();

  return x;
}
						</code></pre>
					</div>
						<div class="container" data-markdown>
							If we use the queue shortcut here it reduces the complexity of the code.
						</div>
				</section>
				<!--Slide 16-->
				<section>
					<div class="hbox" data-markdown>
						#### Usm_wrapper **ComputeCpp only**
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
using namespace experimental {

template &lt;typename T&gt;
class usm_wrapper;

}
						</code></pre>
					</div>
					<div class="container" data-markdown>
						* USM support in ComputeCpp is still experimental.
						* You currently need to wrap USM pointers in a `usm_wrapper` to pass them to a kernel function.
						* The `usm_wrapper` will behave like and convert to the raw pointer type.
						* This will be removed when ComputeCpp fully supports SYCL 2020.
					</div>
				</section>
				<!--Slide 17-->
				<section>
					<div class="hbox" data-markdown>
						#### With the usm_wrapper **ComputeCpp only**
					</div>
					<div class="container">
						<code class="code-100pc"><pre>
int square_number(int x){

  auto myQueue = sycl::queue{usm_selector{}};

  auto devicePtr = <mark>experimental::usm_wrapper&lt;int&gt;(</mark>malloc_device&lt;int&gt;(1, myQueue)<mark>)</mark>;

  myQueue.memcpy(devicePtr, &x, sizeof(int)).wait();

  myQueue.single_task&lt;square&gt;([=](){
    *devicePtr = (*devicePtr) * (*devicePtr);
  }).wait();

  myQueue.memcpy(&x, devicePtr, sizeof(int)).wait();

  return x;
}
						</code></pre>
					</div>
						<div class="container" data-markdown>
							In ComputeCpp we wrap the result of `malloc_device` with a `usm_wrapper<int>` so it can be passed to the kernel function.
						</div>
				</section>
				<!--Slide 18-->
				<section>
					<div class="hbox" data-markdown>
						## Questions
					</div>
				</section>
				<!--Slide 19-->
				<section>
					<div class="hbox" data-markdown>
						#### Exercise
					</div>
					<div class="container" data-markdown>
						Code_Exercises/Exercise_8_USM_Vector_Add/source
					</div>
					<div class="container" data-markdown>
						Implement the vector add from lesson 3 using the USM data management model.
					</div>
				</section>
			</div>
		</div>
		<script src="../common-revealjs/js/reveal.js"></script>
		<script src="../common-revealjs/plugin/markdown/marked.js"></script>
		<script src="../common-revealjs/plugin/markdown/markdown.js"></script>
		<script src="../common-revealjs/plugin/notes/notes.js"></script>
		<script>
			Reveal.initialize({mouseWheel: true, defaultNotes: true});
			Reveal.configure({ slideNumber: true });
		</script>
	</body>
</html>