From 7c3fc31a167c1c1029894b90d9fe0c2d7bb40387 Mon Sep 17 00:00:00 2001
From: utkinis <iutkin@ethz.ch>
Date: Tue, 15 Oct 2024 17:13:43 +0000
Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20=20@=2092d58?=
 =?UTF-8?q?c83f979683c8fd271eef04b2db976496eb7=20=F0=9F=9A=80?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 404.html                                      |  2 +-
 Manifest.toml                                 |  4 +-
 assets/literate/l5_1-cpu-parallel_web.md      | 85 +++++++++++--------
 .../literate/l5_1-cpu-parallel_web_script.jl  | 83 ++++++++++--------
 extras/index.html                             |  2 +-
 final_proj/index.html                         |  2 +-
 homework/index.html                           |  2 +-
 index.html                                    |  2 +-
 lecture1/index.html                           |  2 +-
 lecture10/index.html                          |  2 +-
 lecture2/index.html                           |  2 +-
 lecture3/index.html                           |  2 +-
 lecture4/index.html                           |  2 +-
 lecture5/index.html                           | 85 +++++++++++--------
 lecture6/index.html                           |  2 +-
 lecture7/index.html                           |  2 +-
 lecture8/index.html                           |  2 +-
 lecture9/index.html                           |  2 +-
 logistics/index.html                          |  2 +-
 package-lock.json                             | 22 ++---
 search/index.html                             |  2 +-
 sitemap.xml                                   | 34 ++++----
 software_install/index.html                   |  2 +-
 23 files changed, 196 insertions(+), 151 deletions(-)
diff --git a/404.html b/404.html
index 6fa030e3..bc20d3d5 100644
--- a/404.html
+++ b/404.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>404: File not found</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content ><h1 id=404_file_not_found ><a href="#404_file_not_found" class=header-anchor >404: File not found</a></h1> <p>The requested file was not found.</p> <p>Please <a href="/">click here</a> to go to the home page.</p> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>404: File not found</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content ><h1 id=404_file_not_found ><a href="#404_file_not_found" class=header-anchor >404: File not found</a></h1> <p>The requested file was not found.</p> <p>Please <a href="/">click here</a> to go to the home page.</p> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/Manifest.toml b/Manifest.toml
index 7cd9e6ab..95b0790f 100644
--- a/Manifest.toml
+++ b/Manifest.toml
@@ -101,9 +101,9 @@ version = "1.11.0"
 
 [[deps.JLLWrappers]]
 deps = ["Artifacts", "Preferences"]
-git-tree-sha1 = "f389674c99bfcde17dc57454011aa44d5a260a40"
+git-tree-sha1 = "be3dc50a92e5a386872a493a10050136d4703f9b"
 uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
-version = "1.6.0"
+version = "1.6.1"
 
 [[deps.JSON]]
 deps = ["Dates", "Mmap", "Parsers", "Unicode"]
diff --git a/assets/literate/l5_1-cpu-parallel_web.md b/assets/literate/l5_1-cpu-parallel_web.md
index a15c12cb..f19561ed 100644
--- a/assets/literate/l5_1-cpu-parallel_web.md
+++ b/assets/literate/l5_1-cpu-parallel_web.md
@@ -145,17 +145,17 @@ As first task, we'll compute the $T_\mathrm{eff}$ for the 2D fluid pressure (dif
 - Compute the elapsed time `t_toc` at the end of the time loop and report:
 
 ````julia:ex1
-t_toc = ...
-A_eff = ...          # Effective main memory access per iteration [GB]
-t_it  = ...          # Execution time per iteration [s]
-T_eff = A_eff/t_it   # Effective memory throughput [GB/s]
+t_toc = Base.time() - t_tic
+A_eff = (3*2)/1e9*nx*ny*sizeof(Float64)  # Effective main memory access per iteration [GB]
+t_it  = t_toc/niter                      # Execution time per iteration [s]
+T_eff = A_eff/t_it                       # Effective memory throughput [GB/s]
 ````
 
 - Report `t_toc`, `T_eff` and `niter` at the end of the code, formatting output using `@printf()` macro.
 - Round `T_eff` to the 3rd significant digit.
 
 ```julia
-@printf("Time = %1.3f sec, ... \n", t_toc, ...)
+@printf("Time = %1.3f sec, T_eff = %1.2f GB/s (niter = %d)\n", t_toc, round(T_eff, sigdigits=3), niter)
 ```
 
 ### Deactivate visualisation (and error checking)
@@ -163,8 +163,10 @@ T_eff = A_eff/t_it   # Effective memory throughput [GB/s]
 - Define a `do_check` flag set to `false`
 
 ````julia:ex2
-function Pf_diffusion_2D(;??)
+function Pf_diffusion_2D(;do_check=false)
+    if do_check && (iter%ncheck == 0)
     ...
+    end
     return
 end
 ````
@@ -217,19 +219,19 @@ The goal is now to write out the diffusion physics in a loop fashion over $x$ an
 Implement a nested loop, taking car of bounds and staggering.
 
 ````julia:ex6
-for iy=??
-    for ix=??
-        qDx[??] -= (qDx[??] + k_ηf_dx* ?? )*_1_θ_dτ
+for iy=1:ny
+    for ix=1:nx-1
+        qDx[ix+1,iy] -= (qDx[ix+1,iy] + k_ηf_dx*(Pf[ix+1,iy]-Pf[ix,iy]))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        qDy[??] -= (qDy[??] + k_ηf_dy* ?? )*_1_θ_dτ
+for iy=1:ny-1
+    for ix=1:nx
+        qDy[ix,iy+1] -= (qDy[ix,iy+1] + k_ηf_dy*(Pf[ix,iy+1]-Pf[ix,iy]))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        Pf[??]  -= ??
+for iy=1:ny
+    for ix=1:nx
+        Pf[ix,iy]  -= ((qDx[ix+1,iy]-qDx[ix,iy])*_dx + (qDy[ix,iy+1]-qDy[ix,iy])*_dy)*_β_dτ
     end
 end
 ````
@@ -239,26 +241,26 @@ We could now use macros to make the code nicer and clearer. Macro expression wil
 Let's use macros to replace the derivative implementations
 
 ````julia:ex7
-macro d_xa(A)  esc(:( $A[??]-$A[??] )) end
-macro d_ya(A)  esc(:( $A[??]-$A[??] )) end
+macro d_xa(A)  esc(:( $A[ix+1,iy]-$A[ix,iy] )) end
+macro d_ya(A)  esc(:( $A[ix,iy+1]-$A[ix,iy] )) end
 ````
 
 And update the code within the iteration loop:
 
 ````julia:ex8
-for iy=??
-    for ix=??
-        qDx[??] -= (qDx[??] + k_ηf_dx* ?? )*_1_θ_dτ
+for iy=1:ny
+    for ix=1:nx-1
+        qDx[ix+1,iy] -= (qDx[ix+1,iy] + k_ηf_dx*@d_xa(Pf))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        qDy[??] -= (qDy[??] + k_ηf_dy* ?? )*_1_θ_dτ
+for iy=1:ny-1
+    for ix=1:nx
+        qDy[ix,iy+1] -= (qDy[ix,iy+1] + k_ηf_dy*@d_ya(Pf))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        Pf[??]  -= ??
+for iy=1:ny
+    for ix=1:nx
+        Pf[ix,iy]  -= (@d_xa(qDx)*_dx + @d_ya(qDy)*_dy)*_β_dτ
     end
 end
 ````
@@ -278,15 +280,28 @@ In this last step, the goal is to define `compute` functions to hold the physics
 Create a `compute_flux!()` and `compute_Pf!()` functions that take input and output arrays and needed scalars as argument and return nothing.
 
 ````julia:ex9
-function compute_flux!(...)
+function compute_flux!(qDx,qDy,Pf,k_ηf_dx,k_ηf_dy,_1_θ_dτ)
     nx,ny=size(Pf)
-    ...
+    for iy=1:ny,
+        for ix=1:nx-1
+            qDx[ix+1,iy] -= (qDx[ix+1,iy] + k_ηf_dx*@d_xa(Pf))*_1_θ_dτ
+        end
+    end
+    for iy=1:ny-1
+        for ix=1:nx
+            qDy[ix,iy+1] -= (qDy[ix,iy+1] + k_ηf_dy*@d_ya(Pf))*_1_θ_dτ
+        end
+    end
     return nothing
 end
 
-function update_Pf!(Pf,...)
+function update_Pf!(Pf,qDx,qDy,_dx,_dy,_β_dτ)
     nx,ny=size(Pf)
-    ...
+    for iy=1:ny
+        for ix=1:nx
+            Pf[ix,iy]  -= (@d_xa(qDx)*_dx + @d_ya(qDy)*_dy)*_β_dτ
+        end
+    end
     return nothing
 end
 ````
@@ -306,9 +321,9 @@ Let's evaluate the performance of our code using `BenchmarkTools`. We will need
 The `compute!()` function:
 
 ````julia:ex10
-function compute!(Pf,qDx,qDy, ???)
-    compute_flux!(...)
-    update_Pf!(...)
+function compute!(Pf,qDx,qDy,k_ηf_dx,k_ηf_dy,_1_θ_dτ,_dx,_dy,_β_dτ)
+    compute_flux!(qDx,qDy,Pf,k_ηf_dx,k_ηf_dy,_1_θ_dτ)
+    update_Pf!(Pf,qDx,qDy,_dx,_dy,_β_dτ)
     return nothing
 end
 ````
@@ -316,8 +331,8 @@ end
 can then be called using `@belapsed` to return elapsed time for a single iteration, letting `BenchmarkTools` taking car about sampling
 
 ````julia:ex11
-t_toc = @belapsed compute!($Pf,$qDx,$qDy,???)
-niter = ???
+t_toc = @belapsed compute!($Pf,$qDx,$qDy,$k_ηf_dx,$k_ηf_dy,$_1_θ_dτ,$_dx,$_dy,$_β_dτ)
+niter = 1
 ````
 
 \note{Note that variables need to be interpolated into the function call, thus taking a `$` in front.}
diff --git a/assets/literate/l5_1-cpu-parallel_web_script.jl b/assets/literate/l5_1-cpu-parallel_web_script.jl
index b3190eeb..9895b32b 100644
--- a/assets/literate/l5_1-cpu-parallel_web_script.jl
+++ b/assets/literate/l5_1-cpu-parallel_web_script.jl
@@ -1,12 +1,14 @@
 # This file was generated, do not modify it.
 
-t_toc = ...
-A_eff = ...          # Effective main memory access per iteration [GB]
-t_it  = ...          # Execution time per iteration [s]
-T_eff = A_eff/t_it   # Effective memory throughput [GB/s]
+t_toc = Base.time() - t_tic
+A_eff = (3*2)/1e9*nx*ny*sizeof(Float64)  # Effective main memory access per iteration [GB]
+t_it  = t_toc/niter                      # Execution time per iteration [s]
+T_eff = A_eff/t_it                       # Effective memory throughput [GB/s]
 
-function Pf_diffusion_2D(;??)
+function Pf_diffusion_2D(;do_check=false)
+    if do_check && (iter%ncheck == 0)
     ...
+    end
     return
 end
 
@@ -16,58 +18,71 @@ _1_θ_dτ = 1.0./(1.0 + θ_dτ)
 
 _dx, _dy = 1.0/dx, 1.0/dy
 
-for iy=??
-    for ix=??
-        qDx[??] -= (qDx[??] + k_ηf_dx* ?? )*_1_θ_dτ
+for iy=1:ny
+    for ix=1:nx-1
+        qDx[ix+1,iy] -= (qDx[ix+1,iy] + k_ηf_dx*(Pf[ix+1,iy]-Pf[ix,iy]))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        qDy[??] -= (qDy[??] + k_ηf_dy* ?? )*_1_θ_dτ
+for iy=1:ny-1
+    for ix=1:nx
+        qDy[ix,iy+1] -= (qDy[ix,iy+1] + k_ηf_dy*(Pf[ix,iy+1]-Pf[ix,iy]))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        Pf[??]  -= ??
+for iy=1:ny
+    for ix=1:nx
+        Pf[ix,iy]  -= ((qDx[ix+1,iy]-qDx[ix,iy])*_dx + (qDy[ix,iy+1]-qDy[ix,iy])*_dy)*_β_dτ
     end
 end
 
-macro d_xa(A)  esc(:( $A[??]-$A[??] )) end
-macro d_ya(A)  esc(:( $A[??]-$A[??] )) end
+macro d_xa(A)  esc(:( $A[ix+1,iy]-$A[ix,iy] )) end
+macro d_ya(A)  esc(:( $A[ix,iy+1]-$A[ix,iy] )) end
 
-for iy=??
-    for ix=??
-        qDx[??] -= (qDx[??] + k_ηf_dx* ?? )*_1_θ_dτ
+for iy=1:ny
+    for ix=1:nx-1
+        qDx[ix+1,iy] -= (qDx[ix+1,iy] + k_ηf_dx*@d_xa(Pf))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        qDy[??] -= (qDy[??] + k_ηf_dy* ?? )*_1_θ_dτ
+for iy=1:ny-1
+    for ix=1:nx
+        qDy[ix,iy+1] -= (qDy[ix,iy+1] + k_ηf_dy*@d_ya(Pf))*_1_θ_dτ
     end
 end
-for iy=??
-    for ix=??
-        Pf[??]  -= ??
+for iy=1:ny
+    for ix=1:nx
+        Pf[ix,iy]  -= (@d_xa(qDx)*_dx + @d_ya(qDy)*_dy)*_β_dτ
     end
 end
 
-function compute_flux!(...)
+function compute_flux!(qDx,qDy,Pf,k_ηf_dx,k_ηf_dy,_1_θ_dτ)
     nx,ny=size(Pf)
-    ...
+    for iy=1:ny,
+        for ix=1:nx-1
+            qDx[ix+1,iy] -= (qDx[ix+1,iy] + k_ηf_dx*@d_xa(Pf))*_1_θ_dτ
+        end
+    end
+    for iy=1:ny-1
+        for ix=1:nx
+            qDy[ix,iy+1] -= (qDy[ix,iy+1] + k_ηf_dy*@d_ya(Pf))*_1_θ_dτ
+        end
+    end
     return nothing
 end
 
-function update_Pf!(Pf,...)
+function update_Pf!(Pf,qDx,qDy,_dx,_dy,_β_dτ)
     nx,ny=size(Pf)
-    ...
+    for iy=1:ny
+        for ix=1:nx
+            Pf[ix,iy]  -= (@d_xa(qDx)*_dx + @d_ya(qDy)*_dy)*_β_dτ
+        end
+    end
     return nothing
 end
 
-function compute!(Pf,qDx,qDy, ???)
-    compute_flux!(...)
-    update_Pf!(...)
+function compute!(Pf,qDx,qDy,k_ηf_dx,k_ηf_dy,_1_θ_dτ,_dx,_dy,_β_dτ)
+    compute_flux!(qDx,qDy,Pf,k_ηf_dx,k_ηf_dy,_1_θ_dτ)
+    update_Pf!(Pf,qDx,qDy,_dx,_dy,_β_dτ)
     return nothing
 end
 
-t_toc = @belapsed compute!($Pf,$qDx,$qDy,???)
-niter = ???
+t_toc = @belapsed compute!($Pf,$qDx,$qDy,$k_ηf_dx,$k_ηf_dy,$_1_θ_dτ,$_dx,$_dy,$_β_dτ)
+niter = 1
diff --git a/extras/index.html b/extras/index.html
index 4658da23..542121d1 100644
--- a/extras/index.html
+++ b/extras/index.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Extras</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item active" href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=extras ><a href="#extras" class=header-anchor >Extras</a></h1> <h2 id=cheatsheets ><a href="#cheatsheets" class=header-anchor >Cheatsheets</a></h2> <ul> <li><p><a href="https://juliadocs.github.io/Julia-Cheat-Sheet/">Fastrack to Julia</a> cheatsheet</p> <li><p><a href="https://cheatsheets.quantecon.org/">MATLAB-Julia-Python comparative</a> cheatsheet &#40;by <a href="https://quantecon.org/">QuantEcon group</a>&#41;</p> <li><p><a href="https://github.com/sswatson/cheatsheets/blob/master/plotsjl-cheatsheet.pdf">Plots.jl</a> cheatsheet</p> <li><p><a href="https://chris.beams.io/posts/git-commit/">7 rules of a great <code>git commit</code> message</a></p> <li><p><a href="https://www.ndpsoftware.com/git-cheatsheet.html#loc&#61;workspace;">git-cheatsheet</a></p> </ul> <h2 id=references ><a href="#references" class=header-anchor >References</a></h2> <ul> <li><p><a href="http://numerical.recipes"><strong>Numerical Recipes</strong></a>. Useful reference that also contains a lot of references to &quot;classical&quot; literature on numerics.</p> <li><p><a href="https://www.springer.com/gp/book/9780387979991"><strong>Numerical Partial Differential Equations: Finite Difference Methods</strong></a>, by J. W. Thomas. Mostly about FDM &#40;parabolic/hyperbolic equations, stability analysis, matrix methods&#41;.</p> <li><p><a href="https://www.taylorfrancis.com/books/mono/10.1201/9781482234213/numerical-heat-transfer-fluid-flow-suhas-patankar"><strong>Numerical heat transfer and fluid flow</strong></a>, by Patankar. Covers more advanced CFD &#40;Navier-Stokes, pressure-velocity coupling etc.&#41; topics and has intuitive explanations of the finite volume method.</p> <li><p><a href=""><strong>Numerical Computation of Internal and External Flows: The Fundamentals of Computational Fluid Dynamics</strong></a>, by Hirsh. A comprehensive resource &#40;700-pages&#41; about CFD in particular, not PDE&#39;s in general.</p> <li><p><a href="https://www.cambridge.org/ch/academic/subjects/mathematics/computational-science/computational-science-and-engineering?format&#61;HB&amp;isbn&#61;9780961408817"><strong>Computational Science and Engineering</strong></a>, by G. Strang &#40;MIT&#41;. It develops a framework for the equations and numerical methods of applied mathematics.</p> <li><p><a href="https://doi.org/10.1017/CBO9780511780820"><strong>Computational methods for Geodynamics</strong></a>, by P. Tackley &#40;ETHZ&#41;. CFD, Stokes equations and finite-difference method &#40;with application to convection - Geodynamics&#41;.</p> <li><p><a href="https://doi.org/10.1017/CBO9780511809101"><strong>Introduction to Numerical Geodynamic Modelling</strong></a>, by T. Gerya &#40;ETHZ&#41;. Diffusion &#40;heat&#41; and Stokes equations derivation and implementation, finite-differences &#40;with application to thermo-mechanical models - Geodynamics&#41;.</p> </ul> <p><em>any further relevant suggestions are welcome - open a PR</em></p> <h2 id=extra_material ><a href="#extra_material" class=header-anchor >Extra material </a></h2> <h3 id=julia_gpu_hpc_tutorials_and_workshops ><a href="#julia_gpu_hpc_tutorials_and_workshops" class=header-anchor >Julia GPU HPC tutorials and workshops</a></h3> <ul> <li><p><a href="https://github.com/luraess/parallel-gpu-workshop-JuliaCon21">Solving differential equations in parallel on GPUs @JuliaCon2021</a>: <iframe width=560  height=315  src="https://www.youtube.com/embed/DvlM0w6lYEY" title="YouTube video player" frameborder=0  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </p> <li><p><a href="https://github.com/luraess/geo-hpc-course">Geo-HPC short course</a></p> <li><p>Advanced GPU Programming with Julia by Sam Omlin &#40;CSCS&#41; and Tim Besard &#40;JuliaComputing&#41; <a href="https://github.com/omlins/julia-gpu-course"><em><strong>material</strong></em></a></p> <li><p><a href="https://github.com/luraess/julia-parallel-course-EGU21">Solving differential equations in parallel with Julia @EGU2021</a></p> <li><p>Solving Nonlinear Multi-Physics on GPU Supercomputers with Julia @JuliaCon2021: <iframe width=560  height=315  src="https://www.youtube.com/embed/vPsfZUqI4_0" title="YouTube video player" frameborder=0  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </p> <li><p><a href="https://github.com/carstenbauer/JuliaHLRS22">Julia for High-Performance Computing</a> by Carsten Bauer at High Performance Computing Center Stuttgart &#40;HLRS&#41;</p> </ul> <h3 id=other_courses_and_workshops ><a href="#other_courses_and_workshops" class=header-anchor >Other courses and workshops</a></h3> <ul> <li><p><a href="https://computationalthinking.mit.edu/Spring21/">Introduction to Computational Thinking @MIT</a></p> <li><p><a href="https://github.com/maleadt/juliacon21-gpu_workshop">Julia GPU workshop @JuliaCon2021</a> <iframe width=560  height=315  src="https://www.youtube.com/embed/Hz9IMJuW5hU" title="YouTube video player" frameborder=0  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </p> </ul> <h3 id=repositories ><a href="#repositories" class=header-anchor >Repositories</a></h3> <ul> <li><p><a href="https://github.com/omlins/ParallelStencil.jl">ParallelStencil.jl</a></p> <li><p><a href="https://github.com/eth-cscs/ImplicitGlobalGrid.jl">ImplicitGlobalGrid.jl</a></p> <li><p><a href="https://github.com/JuliaGPU/CUDA.jl">CUDA.jl</a></p> <li><p><a href="https://github.com/JuliaPlots/Plots.jl">Plots.jl</a></p> <li><p><a href="https://github.com/timholy/Revise.jl">Revise.jl</a></p> </ul> <h3 id=other_resources ><a href="#other_resources" class=header-anchor >Other resources</a></h3> <ul> <li><p><a href="https://julialang.org">Julia language</a> main website</p> <li><p><a href="https://discourse.julialang.org/">Julia Discourse</a> &#40;Q&amp;A - help&#41;</p> <li><p><a href="https://juliagpu.org">JuliaGPU</a></p> <li><p><a href="https://juliahub.com/ui/Packages">JuliaHub: Package search</a></p> </ul> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Extras</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item active" href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=extras ><a href="#extras" class=header-anchor >Extras</a></h1> <h2 id=cheatsheets ><a href="#cheatsheets" class=header-anchor >Cheatsheets</a></h2> <ul> <li><p><a href="https://juliadocs.github.io/Julia-Cheat-Sheet/">Fastrack to Julia</a> cheatsheet</p> <li><p><a href="https://cheatsheets.quantecon.org/">MATLAB-Julia-Python comparative</a> cheatsheet &#40;by <a href="https://quantecon.org/">QuantEcon group</a>&#41;</p> <li><p><a href="https://github.com/sswatson/cheatsheets/blob/master/plotsjl-cheatsheet.pdf">Plots.jl</a> cheatsheet</p> <li><p><a href="https://chris.beams.io/posts/git-commit/">7 rules of a great <code>git commit</code> message</a></p> <li><p><a href="https://www.ndpsoftware.com/git-cheatsheet.html#loc&#61;workspace;">git-cheatsheet</a></p> </ul> <h2 id=references ><a href="#references" class=header-anchor >References</a></h2> <ul> <li><p><a href="http://numerical.recipes"><strong>Numerical Recipes</strong></a>. Useful reference that also contains a lot of references to &quot;classical&quot; literature on numerics.</p> <li><p><a href="https://www.springer.com/gp/book/9780387979991"><strong>Numerical Partial Differential Equations: Finite Difference Methods</strong></a>, by J. W. Thomas. Mostly about FDM &#40;parabolic/hyperbolic equations, stability analysis, matrix methods&#41;.</p> <li><p><a href="https://www.taylorfrancis.com/books/mono/10.1201/9781482234213/numerical-heat-transfer-fluid-flow-suhas-patankar"><strong>Numerical heat transfer and fluid flow</strong></a>, by Patankar. Covers more advanced CFD &#40;Navier-Stokes, pressure-velocity coupling etc.&#41; topics and has intuitive explanations of the finite volume method.</p> <li><p><a href=""><strong>Numerical Computation of Internal and External Flows: The Fundamentals of Computational Fluid Dynamics</strong></a>, by Hirsh. A comprehensive resource &#40;700-pages&#41; about CFD in particular, not PDE&#39;s in general.</p> <li><p><a href="https://www.cambridge.org/ch/academic/subjects/mathematics/computational-science/computational-science-and-engineering?format&#61;HB&amp;isbn&#61;9780961408817"><strong>Computational Science and Engineering</strong></a>, by G. Strang &#40;MIT&#41;. It develops a framework for the equations and numerical methods of applied mathematics.</p> <li><p><a href="https://doi.org/10.1017/CBO9780511780820"><strong>Computational methods for Geodynamics</strong></a>, by P. Tackley &#40;ETHZ&#41;. CFD, Stokes equations and finite-difference method &#40;with application to convection - Geodynamics&#41;.</p> <li><p><a href="https://doi.org/10.1017/CBO9780511809101"><strong>Introduction to Numerical Geodynamic Modelling</strong></a>, by T. Gerya &#40;ETHZ&#41;. Diffusion &#40;heat&#41; and Stokes equations derivation and implementation, finite-differences &#40;with application to thermo-mechanical models - Geodynamics&#41;.</p> </ul> <p><em>any further relevant suggestions are welcome - open a PR</em></p> <h2 id=extra_material ><a href="#extra_material" class=header-anchor >Extra material </a></h2> <h3 id=julia_gpu_hpc_tutorials_and_workshops ><a href="#julia_gpu_hpc_tutorials_and_workshops" class=header-anchor >Julia GPU HPC tutorials and workshops</a></h3> <ul> <li><p><a href="https://github.com/luraess/parallel-gpu-workshop-JuliaCon21">Solving differential equations in parallel on GPUs @JuliaCon2021</a>: <iframe width=560  height=315  src="https://www.youtube.com/embed/DvlM0w6lYEY" title="YouTube video player" frameborder=0  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </p> <li><p><a href="https://github.com/luraess/geo-hpc-course">Geo-HPC short course</a></p> <li><p>Advanced GPU Programming with Julia by Sam Omlin &#40;CSCS&#41; and Tim Besard &#40;JuliaComputing&#41; <a href="https://github.com/omlins/julia-gpu-course"><em><strong>material</strong></em></a></p> <li><p><a href="https://github.com/luraess/julia-parallel-course-EGU21">Solving differential equations in parallel with Julia @EGU2021</a></p> <li><p>Solving Nonlinear Multi-Physics on GPU Supercomputers with Julia @JuliaCon2021: <iframe width=560  height=315  src="https://www.youtube.com/embed/vPsfZUqI4_0" title="YouTube video player" frameborder=0  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </p> <li><p><a href="https://github.com/carstenbauer/JuliaHLRS22">Julia for High-Performance Computing</a> by Carsten Bauer at High Performance Computing Center Stuttgart &#40;HLRS&#41;</p> </ul> <h3 id=other_courses_and_workshops ><a href="#other_courses_and_workshops" class=header-anchor >Other courses and workshops</a></h3> <ul> <li><p><a href="https://computationalthinking.mit.edu/Spring21/">Introduction to Computational Thinking @MIT</a></p> <li><p><a href="https://github.com/maleadt/juliacon21-gpu_workshop">Julia GPU workshop @JuliaCon2021</a> <iframe width=560  height=315  src="https://www.youtube.com/embed/Hz9IMJuW5hU" title="YouTube video player" frameborder=0  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </p> </ul> <h3 id=repositories ><a href="#repositories" class=header-anchor >Repositories</a></h3> <ul> <li><p><a href="https://github.com/omlins/ParallelStencil.jl">ParallelStencil.jl</a></p> <li><p><a href="https://github.com/eth-cscs/ImplicitGlobalGrid.jl">ImplicitGlobalGrid.jl</a></p> <li><p><a href="https://github.com/JuliaGPU/CUDA.jl">CUDA.jl</a></p> <li><p><a href="https://github.com/JuliaPlots/Plots.jl">Plots.jl</a></p> <li><p><a href="https://github.com/timholy/Revise.jl">Revise.jl</a></p> </ul> <h3 id=other_resources ><a href="#other_resources" class=header-anchor >Other resources</a></h3> <ul> <li><p><a href="https://julialang.org">Julia language</a> main website</p> <li><p><a href="https://discourse.julialang.org/">Julia Discourse</a> &#40;Q&amp;A - help&#41;</p> <li><p><a href="https://juliagpu.org">JuliaGPU</a></p> <li><p><a href="https://juliahub.com/ui/Packages">JuliaHub: Package search</a></p> </ul> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/final_proj/index.html b/final_proj/index.html
index e08f6ea9..cb5f060f 100644
--- a/final_proj/index.html
+++ b/final_proj/index.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Information about final projects</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item active" href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content ><h2 id=information_about_final_projects ><a href="#information_about_final_projects" class=header-anchor >Information about final projects</a></h2> <p>Final projects will provide 35&#37; of the course grade. We recommend you work in teams of two, but being your own teammate is fine too.</p> <p>🚧 More infos to come in due time.</p> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Information about final projects</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item active" href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content ><h2 id=information_about_final_projects ><a href="#information_about_final_projects" class=header-anchor >Information about final projects</a></h2> <p>Final projects will provide 35&#37; of the course grade. We recommend you work in teams of two, but being your own teammate is fine too.</p> <p>🚧 More infos to come in due time.</p> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/homework/index.html b/homework/index.html
index de2b7c90..7faa391c 100644
--- a/homework/index.html
+++ b/homework/index.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Homework</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item active" href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=homework ><a href="#homework" class=header-anchor >Homework</a></h1> <table><tr><th align=center >Assignment<th align=center >Due date<th align=center >Submission<th align=center >Notes<tr><td align=center >Lect. 1<td align=center >25.09.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103803">Moodle</a><td align=center >Submit a folder containing all exercise notebooks from <a href="https://jhub-let-04-23586.ethz.ch/hub/home">JupyterHub</a>.<tr><td align=center >Lect. 2<td align=center >02.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1115224">Moodle &#40;notebooks&#41;</a>, <a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103813">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the notebooks submission, submit a folder containing all exercise notebooks from <a href="https://jhub-let-04-23586.ethz.ch/hub/home">JupyterHub</a>. For the commit hash &#43; PR submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-2</code> and open a pull request on the <code>main</code> branch. Paste both the commit hash and the PR link on Moodle &#40;check <a href="/logistics/#submission">Logistics</a> for more details on how to set up the GitHub repository&#41;.<tr><td align=center >Lect. 3<td align=center >11.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103821">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-3</code> and open a pull request on the <code>main</code> branch. Paste both the SHA and the PR link on Moodle.<tr><td align=center >Lect. 4<td align=center >18.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103827">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-4</code> and open a pull request on the <code>main</code> branch. Paste both the SHA and the PR link on Moodle.</table> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Homework</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item active" href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=homework ><a href="#homework" class=header-anchor >Homework</a></h1> <table><tr><th align=center >Assignment<th align=center >Due date<th align=center >Submission<th align=center >Notes<tr><td align=center >Lect. 1<td align=center >25.09.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103803">Moodle</a><td align=center >Submit a folder containing all exercise notebooks from <a href="https://jhub-let-04-23586.ethz.ch/hub/home">JupyterHub</a>.<tr><td align=center >Lect. 2<td align=center >02.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1115224">Moodle &#40;notebooks&#41;</a>, <a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103813">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the notebooks submission, submit a folder containing all exercise notebooks from <a href="https://jhub-let-04-23586.ethz.ch/hub/home">JupyterHub</a>. For the commit hash &#43; PR submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-2</code> and open a pull request on the <code>main</code> branch. Paste both the commit hash and the PR link on Moodle &#40;check <a href="/logistics/#submission">Logistics</a> for more details on how to set up the GitHub repository&#41;.<tr><td align=center >Lect. 3<td align=center >11.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103821">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-3</code> and open a pull request on the <code>main</code> branch. Paste both the SHA and the PR link on Moodle.<tr><td align=center >Lect. 4<td align=center >18.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103827">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-4</code> and open a pull request on the <code>main</code> branch. Paste both the SHA and the PR link on Moodle.<tr><td align=center >Lect. 5<td align=center >25.10.2024 - 23h59 CET<td align=center ><a href="https://moodle-app2.let.ethz.ch/mod/assign/view.php?id&#61;1103833">Moodle &#40;commit hash &#43; PR&#41;</a><td align=center >For the submission, copy the git commit hash &#40;SHA&#41; of the final push on the branch <code>homework-5</code> and open a pull request on the <code>main</code> branch. Paste both the SHA and the PR link on Moodle.</table> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/index.html b/index.html
index 7de7e7f6..95798a84 100644
--- a/index.html
+++ b/index.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Solving PDEs in parallel on GPUs with Julia</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item active" href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=solving_pdes_in_parallel_on_gpus_with_julia ><a href="#solving_pdes_in_parallel_on_gpus_with_julia" class=header-anchor >Solving PDEs in parallel on GPUs with Julia</a></h1> <p>🎉 Welcome to ETH&#39;s <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez&#61;2024W&amp;ansicht&#61;KATALOGDATEN&amp;lerneinheitId&#61;182481&amp;lang&#61;en"><strong>course 101-0250-00L</strong></a> on solving partial differential equations &#40;PDEs&#41; in parallel on graphical processing units &#40;GPUs&#41; with the <a href="http://www.julialang.org/">Julia programming language</a>.</p> <div class=note ><div class=title >💡 Note</div> <div class=messg >2024 edition starts Tuesday Sept. 17, 12h45. Welcome&#33;</div></div> <h2 id=course_information ><a href="#course_information" class=header-anchor >Course information</a></h2> <p>This course aims to cover state-of-the-art methods in modern parallel GPU computing, supercomputing and scientific software development with applications to natural sciences and engineering. The course is open source and is available on <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00">GitHub</a>.</p> <center> <video width="80%" autoplay loop controls src="/assets/porous_convection_2D.mp4"/> </center> <h3 id=objective ><a href="#objective" class=header-anchor >Objective</a></h3> <p>The goal of this course is to offer a practical approach to solve systems of partial differential equations in parallel on GPUs using the <a href="http://www.julialang.org/">Julia programming language</a>. Julia combines high-level language expressiveness and low-level language performance which enables efficient code development. The Julia GPU applications will be hosted on GitHub and implement modern software development practices.</p> <h3 id=outline ><a href="#outline" class=header-anchor >Outline</a></h3> <ul> <li><p><strong>Part 1</strong> <em>Introducing Julia &amp; PDEs</em></p> <ul> <li><p>The Julia language: hands-on</p> <li><p>Solving physical processes: advection, reaction, diffusion &amp; wave propagation</p> <li><p>Spatial and temporal discretisation: finite differences and explicit time-stepping</p> <li><p>Software development tools: Git, Continuous Integration</p> </ul> <li><p><strong>Part 2</strong> <em>Solving PDEs on GPUs</em></p> <ul> <li><p>Steady-state, implicit &amp; nonlinear solutions</p> <li><p>Efficient iterative algorithms</p> <li><p>Parallel and GPU computing</p> <li><p>Simulation performance limiters</p> </ul> <li><p><strong>Part 3</strong> <em>Projects</em></p> <p>Multi-GPU computing and optimisations</p> <ul> <li><p>xPU computing</p> <li><p>Distributed computing</p> <li><p>Advanced optimisations</p> </ul> <li><p><strong>Final projects</strong></p> <p>Solve a solid mechanics or fluid dynamics problem of your interest, such as:</p> <ul> <li><p>dynamic elasticity — seismic wave propagation</p> <li><p>Maxwell&#39;s equations — electromagnetic fields propagation</p> <li><p>shallow-water equations — rivers, lakes, or oceans</p> <li><p>shallow ice approximation — ice sheet evolution</p> <li><p>Navier–Stokes equations — fluid or smoke</p> <li><p>thermo-mechanically coupled Stokes flow — mantle convection</p> <li><p>hydro-mechanically coupled Stokes flow — subsurface CO2 flow</p> <li><p>your own idea</p> </ul> </ul> <h2 id=teaching_staff ><a href="#teaching_staff" class=header-anchor >Teaching staff</a></h2> <ul> <li><p><a href="https://github.com/utkinis">Ivan Utkin</a> — ETHZ / WSL</p> <li><p><a href="https://github.com/luraess">Ludovic Räss</a> — Unil / ETHZ</p> <li><p><a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid&#61;124402">Mauro Werder</a> — WSL / ETHZ</p> <li><p><a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> — CSCS / ETHZ</p> <li><p>Teaching Assistant: <a href="https://github.com/youwuyou">You Wu</a> — ETHZ</p> </ul> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Solving PDEs in parallel on GPUs with Julia</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item active" href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=solving_pdes_in_parallel_on_gpus_with_julia ><a href="#solving_pdes_in_parallel_on_gpus_with_julia" class=header-anchor >Solving PDEs in parallel on GPUs with Julia</a></h1> <p>🎉 Welcome to ETH&#39;s <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez&#61;2024W&amp;ansicht&#61;KATALOGDATEN&amp;lerneinheitId&#61;182481&amp;lang&#61;en"><strong>course 101-0250-00L</strong></a> on solving partial differential equations &#40;PDEs&#41; in parallel on graphical processing units &#40;GPUs&#41; with the <a href="http://www.julialang.org/">Julia programming language</a>.</p> <div class=note ><div class=title >💡 Note</div> <div class=messg >2024 edition starts Tuesday Sept. 17, 12h45. Welcome&#33;</div></div> <h2 id=course_information ><a href="#course_information" class=header-anchor >Course information</a></h2> <p>This course aims to cover state-of-the-art methods in modern parallel GPU computing, supercomputing and scientific software development with applications to natural sciences and engineering. The course is open source and is available on <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00">GitHub</a>.</p> <center> <video width="80%" autoplay loop controls src="/assets/porous_convection_2D.mp4"/> </center> <h3 id=objective ><a href="#objective" class=header-anchor >Objective</a></h3> <p>The goal of this course is to offer a practical approach to solve systems of partial differential equations in parallel on GPUs using the <a href="http://www.julialang.org/">Julia programming language</a>. Julia combines high-level language expressiveness and low-level language performance which enables efficient code development. The Julia GPU applications will be hosted on GitHub and implement modern software development practices.</p> <h3 id=outline ><a href="#outline" class=header-anchor >Outline</a></h3> <ul> <li><p><strong>Part 1</strong> <em>Introducing Julia &amp; PDEs</em></p> <ul> <li><p>The Julia language: hands-on</p> <li><p>Solving physical processes: advection, reaction, diffusion &amp; wave propagation</p> <li><p>Spatial and temporal discretisation: finite differences and explicit time-stepping</p> <li><p>Software development tools: Git, Continuous Integration</p> </ul> <li><p><strong>Part 2</strong> <em>Solving PDEs on GPUs</em></p> <ul> <li><p>Steady-state, implicit &amp; nonlinear solutions</p> <li><p>Efficient iterative algorithms</p> <li><p>Parallel and GPU computing</p> <li><p>Simulation performance limiters</p> </ul> <li><p><strong>Part 3</strong> <em>Projects</em></p> <p>Multi-GPU computing and optimisations</p> <ul> <li><p>xPU computing</p> <li><p>Distributed computing</p> <li><p>Advanced optimisations</p> </ul> <li><p><strong>Final projects</strong></p> <p>Solve a solid mechanics or fluid dynamics problem of your interest, such as:</p> <ul> <li><p>dynamic elasticity — seismic wave propagation</p> <li><p>Maxwell&#39;s equations — electromagnetic fields propagation</p> <li><p>shallow-water equations — rivers, lakes, or oceans</p> <li><p>shallow ice approximation — ice sheet evolution</p> <li><p>Navier–Stokes equations — fluid or smoke</p> <li><p>thermo-mechanically coupled Stokes flow — mantle convection</p> <li><p>hydro-mechanically coupled Stokes flow — subsurface CO2 flow</p> <li><p>your own idea</p> </ul> </ul> <h2 id=teaching_staff ><a href="#teaching_staff" class=header-anchor >Teaching staff</a></h2> <ul> <li><p><a href="https://github.com/utkinis">Ivan Utkin</a> — ETHZ / WSL</p> <li><p><a href="https://github.com/luraess">Ludovic Räss</a> — Unil / ETHZ</p> <li><p><a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid&#61;124402">Mauro Werder</a> — WSL / ETHZ</p> <li><p><a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> — CSCS / ETHZ</p> <li><p>Teaching Assistant: <a href="https://github.com/youwuyou">You Wu</a> — ETHZ</p> </ul> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/lecture1/index.html b/lecture1/index.html
index bdd54d2c..ab580487 100644
--- a/lecture1/index.html
+++ b/lecture1/index.html
@@ -730,7 +730,7 @@ <h3 id=question_2__2 ><a href="#question_2__2" class=header-anchor >Question 2</
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture10/index.html b/lecture10/index.html
index f0c5695c..7a66178d 100644
--- a/lecture10/index.html
+++ b/lecture10/index.html
@@ -417,7 +417,7 @@ <h3 id=task_7_communication_via_shared_memory ><a href="#task_7_communication_vi
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture2/index.html b/lecture2/index.html
index eab79d01..8dabbb07 100644
--- a/lecture2/index.html
+++ b/lecture2/index.html
@@ -565,7 +565,7 @@ <h3 id=github_task ><a href="#github_task" class=header-anchor >GitHub task</a><
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture3/index.html b/lecture3/index.html
index 4f0eecd6..e3138e82 100644
--- a/lecture3/index.html
+++ b/lecture3/index.html
@@ -445,7 +445,7 @@ <h3 id=task_1__4 ><a href="#task_1__4" class=header-anchor >Task 1</a></h3>
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture4/index.html b/lecture4/index.html
index 7ac8f6e2..2bdf246e 100644
--- a/lecture4/index.html
+++ b/lecture4/index.html
@@ -231,7 +231,7 @@ <h3 id=task_3__2 ><a href="#task_3__2" class=header-anchor >Task 3</a></h3>
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture5/index.html b/lecture5/index.html
index 9a5e0739..34e47443 100644
--- a/lecture5/index.html
+++ b/lecture5/index.html
@@ -1,7 +1,7 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/libs/katex/katex.min.css"> <link rel=stylesheet  href="/libs/highlight/github.min.css"> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Lecture 5</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item active" href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=lecture_5 ><a href="#lecture_5" class=header-anchor >Lecture 5</a></h1> <blockquote> <p><strong>Agenda</strong><br />📚 Parallel computing on CPUs &amp; performance assessment, the <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> metric<br />💻 Unit testing in Julia<br />🚧 Exercises:</p> <ul> <li><p>CPU perf. codes for 2D diffusion and memcopy</p> <li><p>Unit tests and testset implementation</p> </ul> </blockquote> <p><hr /> </p> <p><a id=content  class=anchor ></a> <strong>Content</strong></p> <div class=franklin-toc ><ol><li><a href="#lecture_5">Lecture 5</a><li><a href="#parallel_computing_on_cpus_and_performance_assessment">Parallel computing &#40;on CPUs&#41; and performance assessment</a><ol><li><a href="#performance">Performance</a><li><a href="#performance_limiters">Performance limiters</a><li><a href="#effective_memory_throughput_metric_t_mathrmeff">Effective memory throughput metric <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></a><li><a href="#parallel_computing_on_cpus">Parallel computing on CPUs</a><li><a href="#shared_memory_parallelisation">Shared memory parallelisation</a></ol><li><a href="#unit_testing_in_julia">Unit testing in Julia</a><ol><li><a href="#basic_unit_tests">Basic unit tests</a><li><a href="#working_with_a_test_sets">Working with a test sets</a></ol><li><a href="#exercises_-_lecture_5">Exercises - lecture 5</a><ol><li><a href="#exercise_1_performance_implementation">Exercise 1 — <strong>Performance implementation</strong></a><li><a href="#exercise_2_performance_evaluation">Exercise 2 — <strong>Performance evaluation</strong></a><li><a href="#exercise_3_unit_tests">Exercise 3 — <strong>Unit tests</strong></a></ol></ol></div> <p><a href="#exercises_-_lecture_5"><em>👉 get started with exercises</em></a></p> <hr /> <h1 id=parallel_computing_on_cpus_and_performance_assessment ><a href="#parallel_computing_on_cpus_and_performance_assessment" class=header-anchor >Parallel computing &#40;on CPUs&#41; and performance assessment</a></h1> <h2 id=performance ><a href="#performance" class=header-anchor >Performance</a></h2> <h3 id=some_questions_for_you ><a href="#some_questions_for_you" class=header-anchor >❓ some questions for you:</a></h3> <ul> <li><p>How to assess the performance of numerical application?</p> <li><p>Are you familiar the concept of wall-time?</p> <li><p>What are the key ingredients to understand performance?</p> </ul> <h3 id=the_goal_of_this_lecture_5_is_to_introduce ><a href="#the_goal_of_this_lecture_5_is_to_introduce" class=header-anchor >The goal of this lecture 5 is to introduce:</a></h3> <ul> <li><p>Performance limiters</p> <li><p>Effective memory throughput metric <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></p> <li><p>Parallel computing on CPUs</p> <li><p>Shared memory parallelisation</p> </ul> <h2 id=performance_limiters ><a href="#performance_limiters" class=header-anchor >Performance limiters</a></h2> <h3 id=hardware ><a href="#hardware" class=header-anchor >Hardware</a></h3> <ul> <li><p>Recent processors &#40;CPUs and GPUs&#41; have multiple &#40;or many&#41; cores</p> <li><p>Recent processors use their parallelism to hide latency &#40;i.e. overlapping execution times &#40;latencies&#41; of individual operations with execution times of other operations&#41;</p> <li><p>Multi-core CPUs and GPUs share similar challenges</p> </ul> <p><em>Recall from lecture 1 &#40;<strong>why we do it</strong>&#41; ...</em></p> <p>Use <strong>parallel computing</strong> &#40;to address this&#41;:</p> <ul> <li><p>The &quot;memory wall&quot; in ~ 2004</p> <li><p>Single-core to multi-core devices</p> </ul> <p><img src="../assets/literate_figures/l1_mem_wall.png" alt=mem_wall  /></p> <p>GPUs are massively parallel devices</p> <ul> <li><p>SIMD machine &#40;programmed using threads - SPMD&#41; &#40;<a href="https://safari.ethz.ch/architecture/fall2020/lib/exe/fetch.php?media&#61;onur-comparch-fall2020-lecture24-simdandgpu-afterlecture.pdf">more</a>&#41;</p> <li><p>Further increases the FLOPS vs Bytes gap</p> </ul> <p><img src="../assets/literate_figures/l1_cpu_gpu_evo.png" alt=cpu_gpu_evo  /></p> <p>Taking a look at a recent GPU and CPU:</p> <ul> <li><p>Nvidia Tesla A100 GPU</p> <li><p>AMD EPYC &quot;Rome&quot; 7282 &#40;16 cores&#41; CPU</p> </ul> <table><tr><th align=center >Device<th align=center >TFLOP/s &#40;FP64&#41;<th align=center >Memory BW TB/s<tr><td align=center >Tesla A100<td align=center >9.7<td align=center >1.55<tr><td align=center >AMD EPYC 7282<td align=center >0.7<td align=center >0.085</table> <p>Current GPUs &#40;and CPUs&#41; can do many more computations in a given amount of time than they can access numbers from main memory.</p> <p>Quantify the imbalance:</p> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><mfrac><mrow><mi mathvariant=normal >c</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >p</mi><mi mathvariant=normal >u</mi><mi mathvariant=normal >t</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >t</mi><mi mathvariant=normal >i</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >n</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >k</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >n</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >e</mi><mtext> </mtext><mo stretchy=false >[</mo><mi mathvariant=normal >T</mi><mi mathvariant=normal >F</mi><mi mathvariant=normal >L</mi><mi mathvariant=normal >O</mi><mi mathvariant=normal >P</mi><mi mathvariant=normal >/</mi><mi mathvariant=normal >s</mi><mo stretchy=false >]</mo></mrow><mrow><mi mathvariant=normal >m</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >y</mi><mtext> </mtext><mi mathvariant=normal >a</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >s</mi><mi mathvariant=normal >s</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >k</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >n</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >e</mi><mtext> </mtext><mo stretchy=false >[</mo><mi mathvariant=normal >T</mi><mi mathvariant=normal >B</mi><mi mathvariant=normal >/</mi><mi mathvariant=normal >s</mi><mo stretchy=false >]</mo></mrow></mfrac><mo>×</mo><mrow><mi mathvariant=normal >s</mi><mi mathvariant=normal >i</mi><mi mathvariant=normal >z</mi><mi mathvariant=normal >e</mi><mtext> </mtext><mi mathvariant=normal >o</mi><mi mathvariant=normal >f</mi><mtext> </mtext><mi mathvariant=normal >a</mi><mtext> </mtext><mi mathvariant=normal >n</mi><mi mathvariant=normal >u</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >b</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >r</mi><mtext> </mtext><mo stretchy=false >[</mo><mi mathvariant=normal >B</mi><mi mathvariant=normal >y</mi><mi mathvariant=normal >t</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >s</mi><mo stretchy=false >]</mo></mrow></mrow><annotation encoding="application/x-tex"> \frac{\mathrm{computation\;peak\;performance\;[TFLOP/s]}}{\mathrm{memory\;access\;peak\;performance\;[TB/s]}} × \mathrm{size\;of\;a\;number\;[Bytes]} </annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:2.363em;vertical-align:-0.936em;"></span><span class=mord ><span class="mopen nulldelimiter"></span><span class=mfrac ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:1.427em;"><span style="top:-2.314em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathrm">m</span><span class="mord mathrm">e</span><span class="mord mathrm">m</span><span class="mord mathrm">o</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">y</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">a</span><span class="mord mathrm">c</span><span class="mord mathrm">c</span><span class="mord mathrm">e</span><span class="mord mathrm">s</span><span class="mord mathrm">s</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">a</span><span class="mord mathrm">k</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.07778em;">f</span><span class="mord mathrm">o</span><span class="mord mathrm">r</span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">n</span><span class="mord mathrm">c</span><span class="mord mathrm">e</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mopen >[</span><span class="mord mathrm">T</span><span class="mord mathrm">B</span><span class="mord mathrm">/</span><span class="mord mathrm">s</span><span class=mclose >]</span></span></span></span><span style="top:-3.23em;"><span class=pstrut  style="height:3em;"></span><span class=frac-line  style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathrm">c</span><span class="mord mathrm">o</span><span class="mord mathrm">m</span><span class="mord mathrm">p</span><span class="mord mathrm">u</span><span class="mord mathrm">t</span><span class="mord mathrm">a</span><span class="mord mathrm">t</span><span class="mord mathrm">i</span><span class="mord mathrm">o</span><span class="mord mathrm">n</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">a</span><span class="mord mathrm">k</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.07778em;">f</span><span class="mord mathrm">o</span><span class="mord mathrm">r</span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">n</span><span class="mord mathrm">c</span><span class="mord mathrm">e</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mopen >[</span><span class="mord mathrm">T</span><span class="mord mathrm">F</span><span class="mord mathrm">L</span><span class="mord mathrm">O</span><span class="mord mathrm">P</span><span class="mord mathrm">/</span><span class="mord mathrm">s</span><span class=mclose >]</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.936em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >×</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mord ><span class="mord mathrm">s</span><span class="mord mathrm">i</span><span class="mord mathrm">z</span><span class="mord mathrm">e</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">o</span><span class="mord mathrm" style="margin-right:0.07778em;">f</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">a</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">n</span><span class="mord mathrm">u</span><span class="mord mathrm">m</span><span class="mord mathrm">b</span><span class="mord mathrm">e</span><span class="mord mathrm">r</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mopen >[</span><span class="mord mathrm">B</span><span class="mord mathrm" style="margin-right:0.01389em;">y</span><span class="mord mathrm">t</span><span class="mord mathrm">e</span><span class="mord mathrm">s</span><span class=mclose >]</span></span></span></span></span></span> <p><em>&#40;Theoretical peak performance values as specified by the vendors can be used&#41;.</em></p> <p>Back to our hardware:</p> <table><tr><th align=center >Device<th align=center >TFLOP/s &#40;FP64&#41;<th align=center >Memory BW TB/s<th align=center >Imbalance &#40;FP64&#41;<tr><td align=center >Tesla A100<td align=center >9.7<td align=center >1.55<td align=center >9.7 / 1.55 × 8 &#61; 50<tr><td align=center >AMD EPYC 7282<td align=center >0.7<td align=center >0.085<td align=center >0.7 / 0.085 × 8 &#61; 66</table> <p><em>&#40;here computed with double precision values&#41;</em></p> <p><strong>Meaning:</strong> we can do 50 &#40;GPU&#41; and 66 &#40;CPU&#41; floating point operations per number accessed from main memory. Floating point operations are &quot;for free&quot; when we work in memory-bounded regimes</p> <p>➡ Requires to re-think the numerical implementation and solution strategies</p> <h3 id=on_the_scientific_application_side ><a href="#on_the_scientific_application_side" class=header-anchor >On the scientific application side</a></h3> <ul> <li><p>Most algorithms require only a few operations or FLOPS ...</p> <li><p>... compared to the amount of numbers or bytes accessed from main memory.</p> </ul> <p>First derivative example <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant=normal >∂</mi><mi>A</mi><mi mathvariant=normal >/</mi><mi mathvariant=normal >∂</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">∂A / ∂x</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">A</span><span class=mord >/</span><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">x</span></span></span></span>:</p> <p>If we &quot;naively&quot; compare the &quot;cost&quot; of an isolated evaluation of a finite-difference first derivative, e.g., computing a flux <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span>:</p> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><mi>q</mi><mo>=</mo><mo>−</mo><mi>D</mi><mtext> </mtext><mfrac><mrow><mi mathvariant=normal >∂</mi><mi>A</mi></mrow><mrow><mi mathvariant=normal >∂</mi><mi>x</mi></mrow></mfrac><mtext> </mtext><mo separator=true >,</mo></mrow><annotation encoding="application/x-tex">q = -D~\frac{∂A}{∂x}~,</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mrel >=</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:2.05744em;vertical-align:-0.686em;"></span><span class=mord >−</span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class="mspace nobreak"> </span><span class=mord ><span class="mopen nulldelimiter"></span><span class=mfrac ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:1.37144em;"><span style="top:-2.314em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">x</span></span></span><span style="top:-3.23em;"><span class=pstrut  style="height:3em;"></span><span class=frac-line  style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">A</span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace nobreak"> </span><span class=mpunct >,</span></span></span></span></span> <p>which in the discrete form reads <code>q&#91;ix&#93; &#61; -D*&#40;A&#91;ix&#43;1&#93;-A&#91;ix&#93;&#41;/dx</code>.</p> <p>The cost of evaluating <code>q&#91;ix&#93; &#61; -D*&#40;A&#91;ix&#43;1&#93;-A&#91;ix&#93;&#41;/dx</code>:</p> <p>1 reads &#43; 1 write &#61;&gt; <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mo>×</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">2 × 8</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.72777em;vertical-align:-0.08333em;"></span><span class=mord >2</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >×</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:0.64444em;vertical-align:0em;"></span><span class=mord >8</span></span></span></span> &#61; <strong>16 Bytes transferred</strong></p> <p>1 &#40;fused&#41; addition and division &#61;&gt; <strong>1 floating point operation</strong></p> <p>assuming:</p> <ul> <li><p><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>D</mi></mrow><annotation encoding="application/x-tex">D</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span></span></span></span>, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant=normal >∂</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">∂x</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.69444em;vertical-align:0em;"></span><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">x</span></span></span></span> are scalars</p> <li><p><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span> and <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal">A</span></span></span></span> are arrays of <code>Float64</code> &#40;read from main memory&#41;</p> </ul> <p>GPUs and CPUs perform 50 - 60 floating-point operations per number accessed from main memory</p> <p>First derivative evaluation requires to transfer 2 numbers per floating-point operations</p> <p>The FLOPS metric is no longer the most adequate for reporting the application performance of many modern applications on modern hardware.</p> <h2 id=effective_memory_throughput_metric_t_mathrmeff ><a href="#effective_memory_throughput_metric_t_mathrmeff" class=header-anchor >Effective memory throughput metric <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></a></h2> <p>Need for a memory throughput-based performance evaluation metric: <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;GB/s&#93;</p> <p>➡ Evaluate the performance of iterative stencil-based solvers.</p> <p>The effective memory access <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>A</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">A_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal">A</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;GB&#93;</p> <p>Sum of:</p> <ul> <li><p>twice the memory footprint of the unknown fields, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>D</mi><mi mathvariant=normal >u</mi></msub></mrow><annotation encoding="application/x-tex">D_\mathrm{u}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">u</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, &#40;fields that depend on their own history and that need to be updated every iteration&#41;</p> <li><p>known fields, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>D</mi><mi mathvariant=normal >k</mi></msub></mrow><annotation encoding="application/x-tex">D_\mathrm{k}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">k</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, that do not change every iteration.</p> </ul> <p>The effective memory access divided by the execution time per iteration, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>t</mi><mrow><mi mathvariant=normal >i</mi><mi mathvariant=normal >t</mi></mrow></msub></mrow><annotation encoding="application/x-tex">t_\mathrm{it}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.76508em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal">t</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;sec&#93;, defines the effective memory throughput, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;GB/s&#93;:</p> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><msub><mi>A</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub><mo>=</mo><mn>2</mn><mtext> </mtext><msub><mi>D</mi><mi mathvariant=normal >u</mi></msub><mo>+</mo><msub><mi>D</mi><mi mathvariant=normal >k</mi></msub></mrow><annotation encoding="application/x-tex"> A_\mathrm{eff} = 2~D_\mathrm{u} + D_\mathrm{k} </annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal">A</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mrel >=</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord >2</span><span class="mspace nobreak"> </span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">u</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >+</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">k</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub><mo>=</mo><mfrac><msub><mi>A</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub><msub><mi>t</mi><mrow><mi mathvariant=normal >i</mi><mi mathvariant=normal >t</mi></mrow></msub></mfrac></mrow><annotation encoding="application/x-tex"> T_\mathrm{eff} = \frac{A_\mathrm{eff}}{t_\mathrm{it}} </annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mrel >=</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:2.19633em;vertical-align:-0.8360000000000001em;"></span><span class=mord ><span class="mopen nulldelimiter"></span><span class=mfrac ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:1.36033em;"><span style="top:-2.3139999999999996em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathnormal">t</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class=pstrut  style="height:3em;"></span><span class=frac-line  style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathnormal">A</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> <p>The upper bound of <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> is <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >k</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{peak}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.969438em;vertical-align:-0.286108em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">p</span><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight">a</span><span class="mord mathrm mtight">k</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span> as measured, e.g., by <a href="https://www.researchgate.net/publication/51992086_Memory_bandwidth_and_machine_balance_in_high_performance_computers">McCalpin, 1995</a> for CPUs or a GPU analogue.</p> <p>Defining the <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> metric, we assume that:</p> <ol> <li><p>we evaluate an iterative stencil-based solver,</p> <li><p>the problem size is much larger than the cache sizes and</p> <li><p>the usage of time blocking is not feasible or advantageous &#40;reasonable for real-world applications&#41;.</p> </ol> <div class=note ><div class=title >💡 Note</div> <div class=messg >Fields within the effective memory access that do not depend on their own history; such fields can be re-computed on the fly or stored on-chip.</div></div> <p>As first task, we&#39;ll compute the <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> for the 2D fluid pressure &#40;diffusion&#41; solver at the core of the porous convection algorithm from previous lecture.</p> <p>👉 Download the script <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/blob/main/scripts/"><code>l5_Pf_diffusion_2D.jl</code></a> to get started.</p> <p><strong>To-do list:</strong></p> <ul> <li><p>copy <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/blob/main/scripts/"><code>l5_Pf_diffusion_2D.jl</code></a>, rename it to <code>Pf_diffusion_2D_Teff.jl</code></p> <li><p>add a timer</p> <li><p>include the performance metric formulas</p> <li><p>deactivate visualisation</p> </ul> <p>💻 Let&#39;s get started</p> <h3 id=timer_and_performance ><a href="#timer_and_performance" class=header-anchor >Timer and performance</a></h3> <ul> <li><p>Use <code>Base.time&#40;&#41;</code> to return the current timestamp</p> <li><p>Define <code>t_tic</code>, the starting time, after 11 iterations steps to allow for &quot;warm-up&quot;</p> <li><p>Record the exact number of iterations &#40;introduce e.g. <code>niter</code>&#41;</p> <li><p>Compute the elapsed time <code>t_toc</code> at the end of the time loop and report:</p> </ul> <pre><code class="julia hljs">t_toc = ...
-A_eff = ...          <span class=hljs-comment ># Effective main memory access per iteration [GB]</span>
-t_it  = ...          <span class=hljs-comment ># Execution time per iteration [s]</span>
-T_eff = A_eff/t_it   <span class=hljs-comment ># Effective memory throughput [GB/s]</span></code></pre> <ul> <li><p>Report <code>t_toc</code>, <code>T_eff</code> and <code>niter</code> at the end of the code, formatting output using <code>@printf&#40;&#41;</code> macro.</p> <li><p>Round <code>T_eff</code> to the 3rd significant digit.</p> </ul> <pre><code class="julia hljs"><span class=hljs-meta >@printf</span>(<span class=hljs-string >&quot;Time = %1.3f sec, ... \n&quot;</span>, t_toc, ...)</code></pre>
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/libs/katex/katex.min.css"> <link rel=stylesheet  href="/libs/highlight/github.min.css"> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Lecture 5</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item active" href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=lecture_5 ><a href="#lecture_5" class=header-anchor >Lecture 5</a></h1> <blockquote> <p><strong>Agenda</strong><br />📚 Parallel computing on CPUs &amp; performance assessment, the <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> metric<br />💻 Unit testing in Julia<br />🚧 Exercises:</p> <ul> <li><p>CPU perf. codes for 2D diffusion and memcopy</p> <li><p>Unit tests and testset implementation</p> </ul> </blockquote> <p><hr /> </p> <p><a id=content  class=anchor ></a> <strong>Content</strong></p> <div class=franklin-toc ><ol><li><a href="#lecture_5">Lecture 5</a><li><a href="#parallel_computing_on_cpus_and_performance_assessment">Parallel computing &#40;on CPUs&#41; and performance assessment</a><ol><li><a href="#performance">Performance</a><li><a href="#performance_limiters">Performance limiters</a><li><a href="#effective_memory_throughput_metric_t_mathrmeff">Effective memory throughput metric <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></a><li><a href="#parallel_computing_on_cpus">Parallel computing on CPUs</a><li><a href="#shared_memory_parallelisation">Shared memory parallelisation</a></ol><li><a href="#unit_testing_in_julia">Unit testing in Julia</a><ol><li><a href="#basic_unit_tests">Basic unit tests</a><li><a href="#working_with_a_test_sets">Working with a test sets</a></ol><li><a href="#exercises_-_lecture_5">Exercises - lecture 5</a><ol><li><a href="#exercise_1_performance_implementation">Exercise 1 — <strong>Performance implementation</strong></a><li><a href="#exercise_2_performance_evaluation">Exercise 2 — <strong>Performance evaluation</strong></a><li><a href="#exercise_3_unit_tests">Exercise 3 — <strong>Unit tests</strong></a></ol></ol></div> <p><a href="#exercises_-_lecture_5"><em>👉 get started with exercises</em></a></p> <hr /> <h1 id=parallel_computing_on_cpus_and_performance_assessment ><a href="#parallel_computing_on_cpus_and_performance_assessment" class=header-anchor >Parallel computing &#40;on CPUs&#41; and performance assessment</a></h1> <h2 id=performance ><a href="#performance" class=header-anchor >Performance</a></h2> <h3 id=some_questions_for_you ><a href="#some_questions_for_you" class=header-anchor >❓ some questions for you:</a></h3> <ul> <li><p>How to assess the performance of numerical application?</p> <li><p>Are you familiar the concept of wall-time?</p> <li><p>What are the key ingredients to understand performance?</p> </ul> <h3 id=the_goal_of_this_lecture_5_is_to_introduce ><a href="#the_goal_of_this_lecture_5_is_to_introduce" class=header-anchor >The goal of this lecture 5 is to introduce:</a></h3> <ul> <li><p>Performance limiters</p> <li><p>Effective memory throughput metric <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></p> <li><p>Parallel computing on CPUs</p> <li><p>Shared memory parallelisation</p> </ul> <h2 id=performance_limiters ><a href="#performance_limiters" class=header-anchor >Performance limiters</a></h2> <h3 id=hardware ><a href="#hardware" class=header-anchor >Hardware</a></h3> <ul> <li><p>Recent processors &#40;CPUs and GPUs&#41; have multiple &#40;or many&#41; cores</p> <li><p>Recent processors use their parallelism to hide latency &#40;i.e. overlapping execution times &#40;latencies&#41; of individual operations with execution times of other operations&#41;</p> <li><p>Multi-core CPUs and GPUs share similar challenges</p> </ul> <p><em>Recall from lecture 1 &#40;<strong>why we do it</strong>&#41; ...</em></p> <p>Use <strong>parallel computing</strong> &#40;to address this&#41;:</p> <ul> <li><p>The &quot;memory wall&quot; in ~ 2004</p> <li><p>Single-core to multi-core devices</p> </ul> <p><img src="../assets/literate_figures/l1_mem_wall.png" alt=mem_wall  /></p> <p>GPUs are massively parallel devices</p> <ul> <li><p>SIMD machine &#40;programmed using threads - SPMD&#41; &#40;<a href="https://safari.ethz.ch/architecture/fall2020/lib/exe/fetch.php?media&#61;onur-comparch-fall2020-lecture24-simdandgpu-afterlecture.pdf">more</a>&#41;</p> <li><p>Further increases the FLOPS vs Bytes gap</p> </ul> <p><img src="../assets/literate_figures/l1_cpu_gpu_evo.png" alt=cpu_gpu_evo  /></p> <p>Taking a look at a recent GPU and CPU:</p> <ul> <li><p>Nvidia Tesla A100 GPU</p> <li><p>AMD EPYC &quot;Rome&quot; 7282 &#40;16 cores&#41; CPU</p> </ul> <table><tr><th align=center >Device<th align=center >TFLOP/s &#40;FP64&#41;<th align=center >Memory BW TB/s<tr><td align=center >Tesla A100<td align=center >9.7<td align=center >1.55<tr><td align=center >AMD EPYC 7282<td align=center >0.7<td align=center >0.085</table> <p>Current GPUs &#40;and CPUs&#41; can do many more computations in a given amount of time than they can access numbers from main memory.</p> <p>Quantify the imbalance:</p> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><mfrac><mrow><mi mathvariant=normal >c</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >p</mi><mi mathvariant=normal >u</mi><mi mathvariant=normal >t</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >t</mi><mi mathvariant=normal >i</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >n</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >k</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >n</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >e</mi><mtext> </mtext><mo stretchy=false >[</mo><mi mathvariant=normal >T</mi><mi mathvariant=normal >F</mi><mi mathvariant=normal >L</mi><mi mathvariant=normal >O</mi><mi mathvariant=normal >P</mi><mi mathvariant=normal >/</mi><mi mathvariant=normal >s</mi><mo stretchy=false >]</mo></mrow><mrow><mi mathvariant=normal >m</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >y</mi><mtext> </mtext><mi mathvariant=normal >a</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >s</mi><mi mathvariant=normal >s</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >k</mi><mtext> </mtext><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >o</mi><mi mathvariant=normal >r</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >n</mi><mi mathvariant=normal >c</mi><mi mathvariant=normal >e</mi><mtext> </mtext><mo stretchy=false >[</mo><mi mathvariant=normal >T</mi><mi mathvariant=normal >B</mi><mi mathvariant=normal >/</mi><mi mathvariant=normal >s</mi><mo stretchy=false >]</mo></mrow></mfrac><mo>×</mo><mrow><mi mathvariant=normal >s</mi><mi mathvariant=normal >i</mi><mi mathvariant=normal >z</mi><mi mathvariant=normal >e</mi><mtext> </mtext><mi mathvariant=normal >o</mi><mi mathvariant=normal >f</mi><mtext> </mtext><mi mathvariant=normal >a</mi><mtext> </mtext><mi mathvariant=normal >n</mi><mi mathvariant=normal >u</mi><mi mathvariant=normal >m</mi><mi mathvariant=normal >b</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >r</mi><mtext> </mtext><mo stretchy=false >[</mo><mi mathvariant=normal >B</mi><mi mathvariant=normal >y</mi><mi mathvariant=normal >t</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >s</mi><mo stretchy=false >]</mo></mrow></mrow><annotation encoding="application/x-tex"> \frac{\mathrm{computation\;peak\;performance\;[TFLOP/s]}}{\mathrm{memory\;access\;peak\;performance\;[TB/s]}} × \mathrm{size\;of\;a\;number\;[Bytes]} </annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:2.363em;vertical-align:-0.936em;"></span><span class=mord ><span class="mopen nulldelimiter"></span><span class=mfrac ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:1.427em;"><span style="top:-2.314em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathrm">m</span><span class="mord mathrm">e</span><span class="mord mathrm">m</span><span class="mord mathrm">o</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.01389em;">y</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">a</span><span class="mord mathrm">c</span><span class="mord mathrm">c</span><span class="mord mathrm">e</span><span class="mord mathrm">s</span><span class="mord mathrm">s</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">a</span><span class="mord mathrm">k</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.07778em;">f</span><span class="mord mathrm">o</span><span class="mord mathrm">r</span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">n</span><span class="mord mathrm">c</span><span class="mord mathrm">e</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mopen >[</span><span class="mord mathrm">T</span><span class="mord mathrm">B</span><span class="mord mathrm">/</span><span class="mord mathrm">s</span><span class=mclose >]</span></span></span></span><span style="top:-3.23em;"><span class=pstrut  style="height:3em;"></span><span class=frac-line  style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathrm">c</span><span class="mord mathrm">o</span><span class="mord mathrm">m</span><span class="mord mathrm">p</span><span class="mord mathrm">u</span><span class="mord mathrm">t</span><span class="mord mathrm">a</span><span class="mord mathrm">t</span><span class="mord mathrm">i</span><span class="mord mathrm">o</span><span class="mord mathrm">n</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">a</span><span class="mord mathrm">k</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">p</span><span class="mord mathrm">e</span><span class="mord mathrm">r</span><span class="mord mathrm" style="margin-right:0.07778em;">f</span><span class="mord mathrm">o</span><span class="mord mathrm">r</span><span class="mord mathrm">m</span><span class="mord mathrm">a</span><span class="mord mathrm">n</span><span class="mord mathrm">c</span><span class="mord mathrm">e</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mopen >[</span><span class="mord mathrm">T</span><span class="mord mathrm">F</span><span class="mord mathrm">L</span><span class="mord mathrm">O</span><span class="mord mathrm">P</span><span class="mord mathrm">/</span><span class="mord mathrm">s</span><span class=mclose >]</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.936em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >×</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mord ><span class="mord mathrm">s</span><span class="mord mathrm">i</span><span class="mord mathrm">z</span><span class="mord mathrm">e</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">o</span><span class="mord mathrm" style="margin-right:0.07778em;">f</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">a</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">n</span><span class="mord mathrm">u</span><span class="mord mathrm">m</span><span class="mord mathrm">b</span><span class="mord mathrm">e</span><span class="mord mathrm">r</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mopen >[</span><span class="mord mathrm">B</span><span class="mord mathrm" style="margin-right:0.01389em;">y</span><span class="mord mathrm">t</span><span class="mord mathrm">e</span><span class="mord mathrm">s</span><span class=mclose >]</span></span></span></span></span></span> <p><em>&#40;Theoretical peak performance values as specified by the vendors can be used&#41;.</em></p> <p>Back to our hardware:</p> <table><tr><th align=center >Device<th align=center >TFLOP/s &#40;FP64&#41;<th align=center >Memory BW TB/s<th align=center >Imbalance &#40;FP64&#41;<tr><td align=center >Tesla A100<td align=center >9.7<td align=center >1.55<td align=center >9.7 / 1.55 × 8 &#61; 50<tr><td align=center >AMD EPYC 7282<td align=center >0.7<td align=center >0.085<td align=center >0.7 / 0.085 × 8 &#61; 66</table> <p><em>&#40;here computed with double precision values&#41;</em></p> <p><strong>Meaning:</strong> we can do 50 &#40;GPU&#41; and 66 &#40;CPU&#41; floating point operations per number accessed from main memory. Floating point operations are &quot;for free&quot; when we work in memory-bounded regimes</p> <p>➡ Requires to re-think the numerical implementation and solution strategies</p> <h3 id=on_the_scientific_application_side ><a href="#on_the_scientific_application_side" class=header-anchor >On the scientific application side</a></h3> <ul> <li><p>Most algorithms require only a few operations or FLOPS ...</p> <li><p>... compared to the amount of numbers or bytes accessed from main memory.</p> </ul> <p>First derivative example <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant=normal >∂</mi><mi>A</mi><mi mathvariant=normal >/</mi><mi mathvariant=normal >∂</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">∂A / ∂x</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">A</span><span class=mord >/</span><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">x</span></span></span></span>:</p> <p>If we &quot;naively&quot; compare the &quot;cost&quot; of an isolated evaluation of a finite-difference first derivative, e.g., computing a flux <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span>:</p> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><mi>q</mi><mo>=</mo><mo>−</mo><mi>D</mi><mtext> </mtext><mfrac><mrow><mi mathvariant=normal >∂</mi><mi>A</mi></mrow><mrow><mi mathvariant=normal >∂</mi><mi>x</mi></mrow></mfrac><mtext> </mtext><mo separator=true >,</mo></mrow><annotation encoding="application/x-tex">q = -D~\frac{∂A}{∂x}~,</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mrel >=</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:2.05744em;vertical-align:-0.686em;"></span><span class=mord >−</span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class="mspace nobreak"> </span><span class=mord ><span class="mopen nulldelimiter"></span><span class=mfrac ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:1.37144em;"><span style="top:-2.314em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">x</span></span></span><span style="top:-3.23em;"><span class=pstrut  style="height:3em;"></span><span class=frac-line  style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">A</span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace nobreak"> </span><span class=mpunct >,</span></span></span></span></span> <p>which in the discrete form reads <code>q&#91;ix&#93; &#61; -D*&#40;A&#91;ix&#43;1&#93;-A&#91;ix&#93;&#41;/dx</code>.</p> <p>The cost of evaluating <code>q&#91;ix&#93; &#61; -D*&#40;A&#91;ix&#43;1&#93;-A&#91;ix&#93;&#41;/dx</code>:</p> <p>1 reads &#43; 1 write &#61;&gt; <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mo>×</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">2 × 8</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.72777em;vertical-align:-0.08333em;"></span><span class=mord >2</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >×</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:0.64444em;vertical-align:0em;"></span><span class=mord >8</span></span></span></span> &#61; <strong>16 Bytes transferred</strong></p> <p>1 &#40;fused&#41; addition and division &#61;&gt; <strong>1 floating point operation</strong></p> <p>assuming:</p> <ul> <li><p><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>D</mi></mrow><annotation encoding="application/x-tex">D</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span></span></span></span>, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant=normal >∂</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">∂x</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.69444em;vertical-align:0em;"></span><span class=mord  style="margin-right:0.05556em;">∂</span><span class="mord mathnormal">x</span></span></span></span> are scalars</p> <li><p><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span></span></span></span> and <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal">A</span></span></span></span> are arrays of <code>Float64</code> &#40;read from main memory&#41;</p> </ul> <p>GPUs and CPUs perform 50 - 60 floating-point operations per number accessed from main memory</p> <p>First derivative evaluation requires to transfer 2 numbers per floating-point operations</p> <p>The FLOPS metric is no longer the most adequate for reporting the application performance of many modern applications on modern hardware.</p> <h2 id=effective_memory_throughput_metric_t_mathrmeff ><a href="#effective_memory_throughput_metric_t_mathrmeff" class=header-anchor >Effective memory throughput metric <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></a></h2> <p>Need for a memory throughput-based performance evaluation metric: <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;GB/s&#93;</p> <p>➡ Evaluate the performance of iterative stencil-based solvers.</p> <p>The effective memory access <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>A</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">A_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal">A</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;GB&#93;</p> <p>Sum of:</p> <ul> <li><p>twice the memory footprint of the unknown fields, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>D</mi><mi mathvariant=normal >u</mi></msub></mrow><annotation encoding="application/x-tex">D_\mathrm{u}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">u</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, &#40;fields that depend on their own history and that need to be updated every iteration&#41;</p> <li><p>known fields, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>D</mi><mi mathvariant=normal >k</mi></msub></mrow><annotation encoding="application/x-tex">D_\mathrm{k}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">k</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, that do not change every iteration.</p> </ul> <p>The effective memory access divided by the execution time per iteration, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>t</mi><mrow><mi mathvariant=normal >i</mi><mi mathvariant=normal >t</mi></mrow></msub></mrow><annotation encoding="application/x-tex">t_\mathrm{it}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.76508em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal">t</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;sec&#93;, defines the effective memory throughput, <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> &#91;GB/s&#93;:</p> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><msub><mi>A</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub><mo>=</mo><mn>2</mn><mtext> </mtext><msub><mi>D</mi><mi mathvariant=normal >u</mi></msub><mo>+</mo><msub><mi>D</mi><mi mathvariant=normal >k</mi></msub></mrow><annotation encoding="application/x-tex"> A_\mathrm{eff} = 2~D_\mathrm{u} + D_\mathrm{k} </annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal">A</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mrel >=</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord >2</span><span class="mspace nobreak"> </span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">u</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >+</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.02778em;">D</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.02778em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">k</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> <span class=katex-display ><span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML" display=block ><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub><mo>=</mo><mfrac><msub><mi>A</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub><msub><mi>t</mi><mrow><mi mathvariant=normal >i</mi><mi mathvariant=normal >t</mi></mrow></msub></mfrac></mrow><annotation encoding="application/x-tex"> T_\mathrm{eff} = \frac{A_\mathrm{eff}}{t_\mathrm{it}} </annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span><span class=mspace  style="margin-right:0.2777777777777778em;"></span><span class=mrel >=</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:2.19633em;vertical-align:-0.8360000000000001em;"></span><span class=mord ><span class="mopen nulldelimiter"></span><span class=mfrac ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:1.36033em;"><span style="top:-2.3139999999999996em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathnormal">t</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class=pstrut  style="height:3em;"></span><span class=frac-line  style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class=pstrut  style="height:3em;"></span><span class=mord ><span class=mord ><span class="mord mathnormal">A</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span> <p>The upper bound of <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> is <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >p</mi><mi mathvariant=normal >e</mi><mi mathvariant=normal >a</mi><mi mathvariant=normal >k</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{peak}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.969438em;vertical-align:-0.286108em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.3361079999999999em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">p</span><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight">a</span><span class="mord mathrm mtight">k</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.286108em;"><span></span></span></span></span></span></span></span></span></span> as measured, e.g., by <a href="https://www.researchgate.net/publication/51992086_Memory_bandwidth_and_machine_balance_in_high_performance_computers">McCalpin, 1995</a> for CPUs or a GPU analogue.</p> <p>Defining the <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> metric, we assume that:</p> <ol> <li><p>we evaluate an iterative stencil-based solver,</p> <li><p>the problem size is much larger than the cache sizes and</p> <li><p>the usage of time blocking is not feasible or advantageous &#40;reasonable for real-world applications&#41;.</p> </ol> <div class=note ><div class=title >💡 Note</div> <div class=messg >Fields within the effective memory access that do not depend on their own history; such fields can be re-computed on the fly or stored on-chip.</div></div> <p>As first task, we&#39;ll compute the <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>T</mi><mrow><mi mathvariant=normal >e</mi><mi mathvariant=normal >f</mi><mi mathvariant=normal >f</mi></mrow></msub></mrow><annotation encoding="application/x-tex">T_\mathrm{eff}</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.83333em;vertical-align:-0.15em;"></span><span class=mord ><span class="mord mathnormal" style="margin-right:0.13889em;">T</span><span class=msupsub ><span class="vlist-t vlist-t2"><span class=vlist-r ><span class=vlist  style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class=pstrut  style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">e</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span><span class="mord mathrm mtight" style="margin-right:0.07778em;">f</span></span></span></span></span><span class=vlist-s >​</span></span><span class=vlist-r ><span class=vlist  style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> for the 2D fluid pressure &#40;diffusion&#41; solver at the core of the porous convection algorithm from previous lecture.</p> <p>👉 Download the script <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/blob/main/scripts/"><code>l5_Pf_diffusion_2D.jl</code></a> to get started.</p> <p><strong>To-do list:</strong></p> <ul> <li><p>copy <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/blob/main/scripts/"><code>l5_Pf_diffusion_2D.jl</code></a>, rename it to <code>Pf_diffusion_2D_Teff.jl</code></p> <li><p>add a timer</p> <li><p>include the performance metric formulas</p> <li><p>deactivate visualisation</p> </ul> <p>💻 Let&#39;s get started</p> <h3 id=timer_and_performance ><a href="#timer_and_performance" class=header-anchor >Timer and performance</a></h3> <ul> <li><p>Use <code>Base.time&#40;&#41;</code> to return the current timestamp</p> <li><p>Define <code>t_tic</code>, the starting time, after 11 iterations steps to allow for &quot;warm-up&quot;</p> <li><p>Record the exact number of iterations &#40;introduce e.g. <code>niter</code>&#41;</p> <li><p>Compute the elapsed time <code>t_toc</code> at the end of the time loop and report:</p> </ul> <pre><code class="julia hljs">t_toc = Base.time() - t_tic
+A_eff = (<span class=hljs-number >3</span>*<span class=hljs-number >2</span>)/<span class=hljs-number >1e9</span>*nx*ny*sizeof(<span class=hljs-built_in >Float64</span>)  <span class=hljs-comment ># Effective main memory access per iteration [GB]</span>
+t_it  = t_toc/niter                      <span class=hljs-comment ># Execution time per iteration [s]</span>
+T_eff = A_eff/t_it                       <span class=hljs-comment ># Effective memory throughput [GB/s]</span></code></pre> <ul> <li><p>Report <code>t_toc</code>, <code>T_eff</code> and <code>niter</code> at the end of the code, formatting output using <code>@printf&#40;&#41;</code> macro.</p> <li><p>Round <code>T_eff</code> to the 3rd significant digit.</p> </ul> <pre><code class="julia hljs"><span class=hljs-meta >@printf</span>(<span class=hljs-string >&quot;Time = %1.3f sec, T_eff = %1.2f GB/s (niter = %d)\n&quot;</span>, t_toc, round(T_eff, sigdigits=<span class=hljs-number >3</span>), niter)</code></pre>
 <h3 id=deactivate_visualisation_and_error_checking ><a href="#deactivate_visualisation_and_error_checking" class=header-anchor >Deactivate visualisation &#40;and error checking&#41;</a></h3>
 <ul>
 <li><p>Use keyword arguments &#40;&quot;kwargs&quot;&#41; to allow for default behaviour</p>
@@ -9,8 +9,10 @@ <h3 id=deactivate_visualisation_and_error_checking ><a href="#deactivate_visuali
 <li><p>Define a <code>do_check</code> flag set to <code>false</code></p>
 
 </ul>
-<pre><code class="julia hljs"><span class=hljs-keyword >function</span> Pf_diffusion_2D(;??)
+<pre><code class="julia hljs"><span class=hljs-keyword >function</span> Pf_diffusion_2D(;do_check=<span class=hljs-literal >false</span>)
+    <span class=hljs-keyword >if</span> do_check &amp;&amp; (iter%ncheck == <span class=hljs-number >0</span>)
     ...
+    <span class=hljs-keyword >end</span>
     <span class=hljs-keyword >return</span>
 <span class=hljs-keyword >end</span></code></pre>
 <p>So far so good, we have now a timer.</p>
@@ -60,39 +62,39 @@ <h3 id=ol_start2_back_to_loops_i ><a href="#ol_start2_back_to_loops_i" class=hea
 <p>As first, duplicate <code>Pf_diffusion_2D_perf.jl</code> and rename it as <code>Pf_diffusion_2D_perf_loop.jl</code>.</p>
 <p>The goal is now to write out the diffusion physics in a loop fashion over <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathnormal">x</span></span></span></span> and <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span></span></span></span> dimensions.</p>
 <p>Implement a nested loop, taking car of bounds and staggering.</p>
-<pre><code class="julia hljs"><span class=hljs-keyword >for</span> iy=??
-    <span class=hljs-keyword >for</span> ix=??
-        qDx[??] -= (qDx[??] + k_ηf_dx* ?? )*_1_θ_dτ
+<pre><code class="julia hljs"><span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny
+    <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx-<span class=hljs-number >1</span>
+        qDx[ix+<span class=hljs-number >1</span>,iy] -= (qDx[ix+<span class=hljs-number >1</span>,iy] + k_ηf_dx*(Pf[ix+<span class=hljs-number >1</span>,iy]-Pf[ix,iy]))*_1_θ_dτ
     <span class=hljs-keyword >end</span>
 <span class=hljs-keyword >end</span>
-<span class=hljs-keyword >for</span> iy=??
-    <span class=hljs-keyword >for</span> ix=??
-        qDy[??] -= (qDy[??] + k_ηf_dy* ?? )*_1_θ_dτ
+<span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny-<span class=hljs-number >1</span>
+    <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx
+        qDy[ix,iy+<span class=hljs-number >1</span>] -= (qDy[ix,iy+<span class=hljs-number >1</span>] + k_ηf_dy*(Pf[ix,iy+<span class=hljs-number >1</span>]-Pf[ix,iy]))*_1_θ_dτ
     <span class=hljs-keyword >end</span>
 <span class=hljs-keyword >end</span>
-<span class=hljs-keyword >for</span> iy=??
-    <span class=hljs-keyword >for</span> ix=??
-        Pf[??]  -= ??
+<span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny
+    <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx
+        Pf[ix,iy]  -= ((qDx[ix+<span class=hljs-number >1</span>,iy]-qDx[ix,iy])*_dx + (qDy[ix,iy+<span class=hljs-number >1</span>]-qDy[ix,iy])*_dy)*_β_dτ
     <span class=hljs-keyword >end</span>
 <span class=hljs-keyword >end</span></code></pre>
 <p>We could now use macros to make the code nicer and clearer. Macro expression will be replaced during pre-processing &#40;prior to compilation&#41;. Also, macro can take arguments by appending <code>&#36;</code> in their definition.</p>
 <p>Let&#39;s use macros to replace the derivative implementations</p>
-<pre><code class="julia hljs"><span class=hljs-keyword >macro</span> d_xa(A)  esc(:( $A[??]-$A[??] )) <span class=hljs-keyword >end</span>
-<span class=hljs-keyword >macro</span> d_ya(A)  esc(:( $A[??]-$A[??] )) <span class=hljs-keyword >end</span></code></pre>
+<pre><code class="julia hljs"><span class=hljs-keyword >macro</span> d_xa(A)  esc(:( $A[ix+<span class=hljs-number >1</span>,iy]-$A[ix,iy] )) <span class=hljs-keyword >end</span>
+<span class=hljs-keyword >macro</span> d_ya(A)  esc(:( $A[ix,iy+<span class=hljs-number >1</span>]-$A[ix,iy] )) <span class=hljs-keyword >end</span></code></pre>
 <p>And update the code within the iteration loop:</p>
-<pre><code class="julia hljs"><span class=hljs-keyword >for</span> iy=??
-    <span class=hljs-keyword >for</span> ix=??
-        qDx[??] -= (qDx[??] + k_ηf_dx* ?? )*_1_θ_dτ
+<pre><code class="julia hljs"><span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny
+    <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx-<span class=hljs-number >1</span>
+        qDx[ix+<span class=hljs-number >1</span>,iy] -= (qDx[ix+<span class=hljs-number >1</span>,iy] + k_ηf_dx*<span class=hljs-meta >@d_xa</span>(Pf))*_1_θ_dτ
     <span class=hljs-keyword >end</span>
 <span class=hljs-keyword >end</span>
-<span class=hljs-keyword >for</span> iy=??
-    <span class=hljs-keyword >for</span> ix=??
-        qDy[??] -= (qDy[??] + k_ηf_dy* ?? )*_1_θ_dτ
+<span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny-<span class=hljs-number >1</span>
+    <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx
+        qDy[ix,iy+<span class=hljs-number >1</span>] -= (qDy[ix,iy+<span class=hljs-number >1</span>] + k_ηf_dy*<span class=hljs-meta >@d_ya</span>(Pf))*_1_θ_dτ
     <span class=hljs-keyword >end</span>
 <span class=hljs-keyword >end</span>
-<span class=hljs-keyword >for</span> iy=??
-    <span class=hljs-keyword >for</span> ix=??
-        Pf[??]  -= ??
+<span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny
+    <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx
+        Pf[ix,iy]  -= (<span class=hljs-meta >@d_xa</span>(qDx)*_dx + <span class=hljs-meta >@d_ya</span>(qDy)*_dy)*_β_dτ
     <span class=hljs-keyword >end</span>
 <span class=hljs-keyword >end</span></code></pre>
 <p>Performance is already quite better with the loop version 🚀.</p>
@@ -106,15 +108,28 @@ <h3 id=ol_start4_back_to_loops_ii ><a href="#ol_start4_back_to_loops_ii" class=h
 <p>Duplicate <code>Pf_diffusion_2D_perf_loop.jl</code> and rename it as <code>Pf_diffusion_2D_perf_loop_fun.jl</code>.</p>
 <p>In this last step, the goal is to define <code>compute</code> functions to hold the physics calculations, and to call those within the time loop.</p>
 <p>Create a <code>compute_flux&#33;&#40;&#41;</code> and <code>compute_Pf&#33;&#40;&#41;</code> functions that take input and output arrays and needed scalars as argument and return nothing.</p>
-<pre><code class="julia hljs"><span class=hljs-keyword >function</span> compute_flux!(...)
+<pre><code class="julia hljs"><span class=hljs-keyword >function</span> compute_flux!(qDx,qDy,Pf,k_ηf_dx,k_ηf_dy,_1_θ_dτ)
     nx,ny=size(Pf)
-    ...
+    <span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny,
+        <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx-<span class=hljs-number >1</span>
+            qDx[ix+<span class=hljs-number >1</span>,iy] -= (qDx[ix+<span class=hljs-number >1</span>,iy] + k_ηf_dx*<span class=hljs-meta >@d_xa</span>(Pf))*_1_θ_dτ
+        <span class=hljs-keyword >end</span>
+    <span class=hljs-keyword >end</span>
+    <span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny-<span class=hljs-number >1</span>
+        <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx
+            qDy[ix,iy+<span class=hljs-number >1</span>] -= (qDy[ix,iy+<span class=hljs-number >1</span>] + k_ηf_dy*<span class=hljs-meta >@d_ya</span>(Pf))*_1_θ_dτ
+        <span class=hljs-keyword >end</span>
+    <span class=hljs-keyword >end</span>
     <span class=hljs-keyword >return</span> <span class=hljs-literal >nothing</span>
 <span class=hljs-keyword >end</span>
 
-<span class=hljs-keyword >function</span> update_Pf!(Pf,...)
+<span class=hljs-keyword >function</span> update_Pf!(Pf,qDx,qDy,_dx,_dy,_β_dτ)
     nx,ny=size(Pf)
-    ...
+    <span class=hljs-keyword >for</span> iy=<span class=hljs-number >1</span>:ny
+        <span class=hljs-keyword >for</span> ix=<span class=hljs-number >1</span>:nx
+            Pf[ix,iy]  -= (<span class=hljs-meta >@d_xa</span>(qDx)*_dx + <span class=hljs-meta >@d_ya</span>(qDy)*_dy)*_β_dτ
+        <span class=hljs-keyword >end</span>
+    <span class=hljs-keyword >end</span>
     <span class=hljs-keyword >return</span> <span class=hljs-literal >nothing</span>
 <span class=hljs-keyword >end</span></code></pre>
 <div class=note ><div class=title >💡 Note</div>
@@ -126,14 +141,14 @@ <h3 id=ol_start4_back_to_loops_ii ><a href="#ol_start4_back_to_loops_ii" class=h
 <p>Various timing and benchmarking tools are available in Julia&#39;s ecosystem to <a href="https://docs.julialang.org/en/v1/manual/performance-tips/">track performance issues</a>. Julia&#39;s <code>Base</code> exposes the <code>@time</code> macro which returns timing and allocation estimation. <a href="https://github.com/JuliaCI/BenchmarkTools.jl">BenchmarkTools.jl</a> package provides finer grained timing and benchmarking tooling, namely the <code>@btime</code>, <code>@belapsed</code> and <code>@benchmark</code> macros, among others.</p>
 <p>Let&#39;s evaluate the performance of our code using <code>BenchmarkTools</code>. We will need to wrap the two compute kernels into a <code>compute&#33;&#40;&#41;</code> function in order to be able to call that one using <code>@belapsed</code>. Query <code>? @belapsed</code> in Julia&#39;s REPL to know more.</p>
 <p>The <code>compute&#33;&#40;&#41;</code> function:</p>
-<pre><code class="julia hljs"><span class=hljs-keyword >function</span> compute!(Pf,qDx,qDy, ???)
-    compute_flux!(...)
-    update_Pf!(...)
+<pre><code class="julia hljs"><span class=hljs-keyword >function</span> compute!(Pf,qDx,qDy,k_ηf_dx,k_ηf_dy,_1_θ_dτ,_dx,_dy,_β_dτ)
+    compute_flux!(qDx,qDy,Pf,k_ηf_dx,k_ηf_dy,_1_θ_dτ)
+    update_Pf!(Pf,qDx,qDy,_dx,_dy,_β_dτ)
     <span class=hljs-keyword >return</span> <span class=hljs-literal >nothing</span>
 <span class=hljs-keyword >end</span></code></pre>
 <p>can then be called using <code>@belapsed</code> to return elapsed time for a single iteration, letting <code>BenchmarkTools</code> taking car about sampling</p>
-<pre><code class="julia hljs">t_toc = <span class=hljs-meta >@belapsed</span> compute!($Pf,$qDx,$qDy,???)
-niter = ???</code></pre>
+<pre><code class="julia hljs">t_toc = <span class=hljs-meta >@belapsed</span> compute!($Pf,$qDx,$qDy,$k_ηf_dx,$k_ηf_dy,$_1_θ_dτ,$_dx,$_dy,$_β_dτ)
+niter = <span class=hljs-number >1</span></code></pre>
 <div class=note ><div class=title >💡 Note</div>
 <div class=messg >Note that variables need to be interpolated into the function call, thus taking a <code>&#36;</code> in front.</div></div>
 <h2 id=shared_memory_parallelisation ><a href="#shared_memory_parallelisation" class=header-anchor >Shared memory parallelisation</a></h2>
@@ -399,7 +414,7 @@ <h3 id=task_1__3 ><a href="#task_1__3" class=header-anchor >Task 1</a></h3>
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture6/index.html b/lecture6/index.html
index 41ffb682..125d2e95 100644
--- a/lecture6/index.html
+++ b/lecture6/index.html
@@ -653,7 +653,7 @@ <h2 id=exercise_3_unit_and_reference_tests ><a href="#exercise_3_unit_and_refere
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture7/index.html b/lecture7/index.html
index cc3eafba..c7d33359 100644
--- a/lecture7/index.html
+++ b/lecture7/index.html
@@ -491,7 +491,7 @@ <h3 id=tasks ><a href="#tasks" class=header-anchor >Tasks</a></h3>
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture8/index.html b/lecture8/index.html
index d8469580..ccd4617f 100644
--- a/lecture8/index.html
+++ b/lecture8/index.html
@@ -322,7 +322,7 @@ <h3 id=task_6 ><a href="#task_6" class=header-anchor >Task 6</a></h3>
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/lecture9/index.html b/lecture9/index.html
index 7e75a156..b620c518 100644
--- a/lecture9/index.html
+++ b/lecture9/index.html
@@ -421,7 +421,7 @@ <h2 id=exercise_2_automatic_documentation_in_julia ><a href="#exercise_2_automat
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>
diff --git a/logistics/index.html b/logistics/index.html
index 498a87bd..a9a8f329 100644
--- a/logistics/index.html
+++ b/logistics/index.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/libs/katex/katex.min.css"> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Logistics</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item active" href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=logistics ><a href="#logistics" class=header-anchor >Logistics</a></h1> <p><a href="https://chat.ethz.ch"><img src="/assets/element_chat.svg#badge" alt="Element chat" /></a> <a href="https://moodle-app2.let.ethz.ch/mod/zoom/view.php?id&#61;1104644"><img src="/assets/zoom_logo.svg#badge" alt="Zoom Meeting" /></a> <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586"><img src="/assets/moodle.png#badge" alt="ETHZ Moodle" /></a></p> <blockquote> <p><strong>Suggestion:</strong> Bookmark this page for easy access to all infos you need for the course.</p> </blockquote> <h2 id=course_structure ><a href="#course_structure" class=header-anchor >Course structure</a></h2> <p>Each lecture contains material on physics, numerics, technical concepts, as well as exercises. The lecture content is outlined in its introduction using the following items for each type of content:</p> <ul> <li><p>📚 Physics: equations, discretisation, implementation, solver, visualisation</p> <li><p>💻 Code: technical, Julia, GitHub</p> <li><p>🚧 Exercises</p> </ul> <p>The course will be taught in a hands-on fashion, putting emphasis on you writing code and completing exercises; lecturing will be kept at a minimum.</p> <h2 id=lectures ><a href="#lectures" class=header-anchor >Lectures</a></h2> <h3 id=live_lectures_tuesdays_12h45-15h30 ><a href="#live_lectures_tuesdays_12h45-15h30" class=header-anchor >Live lectures | Tuesdays 12h45-15h30</a></h3> <ul> <li><p>Lectures will take place in <a href="http://www.mapsearch.ethz.ch/map/mapSearchPre.do?gebaeudeMap&#61;HCI&amp;geschossMap&#61;E&amp;raumMap&#61;8&amp;farbcode&#61;c010&amp;lang&#61;en">HCI</a> <a href="http://www.rauminfo.ethz.ch/Rauminfo/grundrissplan.gif?gebaeude&#61;HCI&amp;geschoss&#61;E&amp;raumNr&#61;8&amp;lang&#61;en">E8</a></p> <li><p>Online attendance will be possible on <a href="https://moodle-app2.let.ethz.ch/mod/zoom/view.php?id&#61;1104644">Zoom</a> for ETH students only</p> <li><p>No online support will be provided during the exercise session, please follow the lectures</p> </ul> <h3 id=office_hours ><a href="#office_hours" class=header-anchor >Office hours</a></h3> <p>Schedule will be defined and communicated on the last lecture.</p> <h2 id=discussion ><a href="#discussion" class=header-anchor >Discussion</a></h2> <p>We use <a href="https://chat.ethz.ch/">Element</a> as the main channel for communication between the teachers and the students, and hopefully also between students. We encourage ETH students to ask and answer questions related to the course, exercises and projects there.</p> <p>Head to the <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586"><em>Element chat</em> link on Moodle</a> to get started with Element:</p> <ol> <li><p>Select <strong>Start Student-Chat</strong></p> <li><p>Login using your NETHZ credentials to start using the browser-based client</p> <li><p>Join the <strong><em>General</em></strong> and <strong><em>Helpdesk</em></strong> rooms</p> <li><p>Download the <a href="https://element.io/">desktop or mobile client</a> for more convenient access or in case of encryption-related issues</p> </ol> <h2 id=homework ><a href="#homework" class=header-anchor >Homework</a></h2> <p><a href="/homework">Homework</a> tasks will be announced after each week&#39;s lecture. The exercise session following the lecture will get you started.</p> <p>Homework <strong>due date will be Wednesday 23h59 CET</strong> every following week &#40;8 days&#41; to allow for Q&amp;A during the following in-class exercise session.</p> <p>All homework assignments can be carried out by groups of two. However, <strong>note that every student has to hand in a personal version of the homework</strong>.</p> <blockquote> <p>➡ Check out the <a href="/homework">Homework</a> page for an overview on expected hand-in and deadlines.</p> </blockquote> <h3 id=submission ><a href="#submission" class=header-anchor >Submission</a></h3> <ul> <li><p>Submission of JupyterHub notebooks after weeks 1 and 2, then GitHub commit hash &#40;SHA&#41; after week 3 and onwards, or other documents happens on the course&#39;s <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a>.</p> <li><p>Actions and tasks related to GitHub will happen on your private course-related GitHub repository.</p> </ul> <p><strong>Starting from lecture 3 and onwards</strong>, the development of homework scripts happens on GitHub <strong>and</strong> you will have to submit the git commit hash &#40;SHA&#41; on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a> in the related <em>git commit hash &#40;SHA&#41;</em> submission activity.</p> <h3 id=submission_for_jupyter_hub_to_moodle ><a href="#submission_for_jupyter_hub_to_moodle" class=header-anchor >Submission for Jupyter Hub to Moodle</a></h3> <ul> <li><p>on the Hub place all notebooks of an assignment into one folder called <code>assignments/lectureX_homework</code></p> <ul> <li><p>note: maybe this folder magically already exists on your Hub with the notebooks added. If not, create it and download the notebooks yourself.</p> </ul> <li><p>in Moodle during submission, select that folder as JupyterHub submission</p> </ul> <h3 id=private_github_repository_setup ><a href="#private_github_repository_setup" class=header-anchor >Private GitHub repository setup</a></h3> <p>Once you have your GitHub account ready &#40;see lecture 2 <a href="/lecture2/#a_brief_git_demo_session">how-to</a>&#41;, create a private repository you will <em><strong>share with the teaching staff only</strong></em> to upload your weekly assignments:</p> <ol> <li><p>Create a <strong>private</strong> GitHub repository named <code>pde-on-gpu-&lt;moodleprofilename&gt;</code>, where <code>&lt;moodleprofilename&gt;</code> has to be replaced by your name <strong>as displayed on Moodle, lowercase, diacritics removed, spacing replaced with hyphens &#40;-&#41;</strong>. For example, if your Moodle profile name is &quot;Joël Désirée van der Linde&quot; your repository should be named <code>pde-on-gpu-joel-desiree-van-der-linde</code>.</p> <li><p>Select an <code>MIT License</code> and add a <code>README.md</code> file.</p> <li><p>Share this private repository on GitHub with the <a href="https://github.com/teaching-bot">teaching bot</a>.</p> <li><p><strong>For each homework submission</strong>, you will:</p> <ul> <li><p>create a git branch named <code>homework-X</code> &#40;X <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∈</mo><mo stretchy=false >[</mo><mn>2</mn><mo>−</mo><mi mathvariant=normal >.</mi><mi mathvariant=normal >.</mi><mi mathvariant=normal >.</mi><mo stretchy=false >]</mo></mrow><annotation encoding="application/x-tex">\in [2-...]</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.5782em;vertical-align:-0.0391em;"></span><span class=mrel >∈</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mopen >[</span><span class=mord >2</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >−</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mord >.</span><span class=mord >.</span><span class=mord >.</span><span class=mclose >]</span></span></span></span>&#41; and switch to that branch &#40;<code>git switch -c homework-X</code>&#41;;</p> <li><p>create a new folder named <code>homework-X</code> to put the exercise codes into;</p> <li><p>&#40;don&#39;t forget to <code>git add</code> the code-files and <code>git commit</code> them&#41;;</p> <li><p>push to GitHub and open a pull request &#40;PR&#41; on the <code>main</code> branch on GitHub;</p> <li><p>copy <strong>the single git commit hash &#40;SHA&#41; after the final push and the link to the PR</strong> and submit <strong>both</strong> on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a> as the assignment hand-in &#40;it will serve to control the material was pushed on time&#41;;</p> <li><p>&#40;do not merge the PR yet&#41;.</p> </ul> </ol> <div class=warning ><div class=title >⚠️ Warning&#33;</div> <div class=messg >Make sure to only include the <code>homework-X</code> folders and <code>README.md</code> in the GitHub repo you share with the exercise bot in order to keep the repository lightweight.</div></div> <div class=note ><div class=title >💡 Note</div> <div class=messg >For homework 3 and later, the respective folders on GitHub should be Julia projects and thus must contain a <code>Project.toml</code> file. The <code>Manifest.toml</code> file should be excluded from version control. To do so, add it as entry to a <code>.gitignore</code> file in the root of your repo. Mac users may also add <code>.DS_Store</code> to their <a href="https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer">global <code>.gitignore</code></a>. Codes could be placed in a <code>scripts/</code> folder. Output material to be displayed in the <code>README.md</code> could be placed in a <code>docs/</code> folder.</div></div> <h3 id=feedback ><a href="#feedback" class=header-anchor >Feedback</a></h3> <p>After the submission deadline, we will correct and grade your assignments. You will get personal feedback directly on the PR as well as on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a>. Once you got feedback, please merge the PR.</p> <p>We will try to correct your assignments before the lecture following the homework&#39;s deadline. This should allow you to get rapid feedback in order to clarify the points you may struggle on as soon as possible.</p> <h2 id=project ><a href="#project" class=header-anchor >Project</a></h2> <p>Starting from lecture 7, and until lecture 9, homework assigments contribute to the course&#39;s first project. The goal of this project is to have a multi-xPU thermal porous convection solver in 3D.</p> <p>🚧 More infos to come in due time.</p> <h2 id=evaluation ><a href="#evaluation" class=header-anchor >Evaluation</a></h2> <p>All homework assigments can be done alone or in groups of two.</p> <p>Enrolled ETHZ students will have to hand in on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a> and GitHub:</p> <ol> <li><p>Six weekly assignments during the course&#39;s Part 1 and Part 2 constitute 30&#37; of the final grade. <strong>The best five out of six homeworks will be counted</strong>.</p> <li><p>A project developed during Part 3 of the course consitutes 35&#37; of the final grade</p> <li><p>A final project consitutes 35&#37; of the final grade</p> </ol> <p><strong>Project submission includes code in a Github repository and an automatically generated documentation</strong>.</p> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/libs/katex/katex.min.css"> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Logistics</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item active" href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content > <h1 id=logistics ><a href="#logistics" class=header-anchor >Logistics</a></h1> <p><a href="https://chat.ethz.ch"><img src="/assets/element_chat.svg#badge" alt="Element chat" /></a> <a href="https://moodle-app2.let.ethz.ch/mod/zoom/view.php?id&#61;1104644"><img src="/assets/zoom_logo.svg#badge" alt="Zoom Meeting" /></a> <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586"><img src="/assets/moodle.png#badge" alt="ETHZ Moodle" /></a></p> <blockquote> <p><strong>Suggestion:</strong> Bookmark this page for easy access to all infos you need for the course.</p> </blockquote> <h2 id=course_structure ><a href="#course_structure" class=header-anchor >Course structure</a></h2> <p>Each lecture contains material on physics, numerics, technical concepts, as well as exercises. The lecture content is outlined in its introduction using the following items for each type of content:</p> <ul> <li><p>📚 Physics: equations, discretisation, implementation, solver, visualisation</p> <li><p>💻 Code: technical, Julia, GitHub</p> <li><p>🚧 Exercises</p> </ul> <p>The course will be taught in a hands-on fashion, putting emphasis on you writing code and completing exercises; lecturing will be kept at a minimum.</p> <h2 id=lectures ><a href="#lectures" class=header-anchor >Lectures</a></h2> <h3 id=live_lectures_tuesdays_12h45-15h30 ><a href="#live_lectures_tuesdays_12h45-15h30" class=header-anchor >Live lectures | Tuesdays 12h45-15h30</a></h3> <ul> <li><p>Lectures will take place in <a href="http://www.mapsearch.ethz.ch/map/mapSearchPre.do?gebaeudeMap&#61;HCI&amp;geschossMap&#61;E&amp;raumMap&#61;8&amp;farbcode&#61;c010&amp;lang&#61;en">HCI</a> <a href="http://www.rauminfo.ethz.ch/Rauminfo/grundrissplan.gif?gebaeude&#61;HCI&amp;geschoss&#61;E&amp;raumNr&#61;8&amp;lang&#61;en">E8</a></p> <li><p>Online attendance will be possible on <a href="https://moodle-app2.let.ethz.ch/mod/zoom/view.php?id&#61;1104644">Zoom</a> for ETH students only</p> <li><p>No online support will be provided during the exercise session, please follow the lectures</p> </ul> <h3 id=office_hours ><a href="#office_hours" class=header-anchor >Office hours</a></h3> <p>Schedule will be defined and communicated on the last lecture.</p> <h2 id=discussion ><a href="#discussion" class=header-anchor >Discussion</a></h2> <p>We use <a href="https://chat.ethz.ch/">Element</a> as the main channel for communication between the teachers and the students, and hopefully also between students. We encourage ETH students to ask and answer questions related to the course, exercises and projects there.</p> <p>Head to the <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586"><em>Element chat</em> link on Moodle</a> to get started with Element:</p> <ol> <li><p>Select <strong>Start Student-Chat</strong></p> <li><p>Login using your NETHZ credentials to start using the browser-based client</p> <li><p>Join the <strong><em>General</em></strong> and <strong><em>Helpdesk</em></strong> rooms</p> <li><p>Download the <a href="https://element.io/">desktop or mobile client</a> for more convenient access or in case of encryption-related issues</p> </ol> <h2 id=homework ><a href="#homework" class=header-anchor >Homework</a></h2> <p><a href="/homework">Homework</a> tasks will be announced after each week&#39;s lecture. The exercise session following the lecture will get you started.</p> <p>Homework <strong>due date will be Wednesday 23h59 CET</strong> every following week &#40;8 days&#41; to allow for Q&amp;A during the following in-class exercise session.</p> <p>All homework assignments can be carried out by groups of two. However, <strong>note that every student has to hand in a personal version of the homework</strong>.</p> <blockquote> <p>➡ Check out the <a href="/homework">Homework</a> page for an overview on expected hand-in and deadlines.</p> </blockquote> <h3 id=submission ><a href="#submission" class=header-anchor >Submission</a></h3> <ul> <li><p>Submission of JupyterHub notebooks after weeks 1 and 2, then GitHub commit hash &#40;SHA&#41; after week 3 and onwards, or other documents happens on the course&#39;s <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a>.</p> <li><p>Actions and tasks related to GitHub will happen on your private course-related GitHub repository.</p> </ul> <p><strong>Starting from lecture 3 and onwards</strong>, the development of homework scripts happens on GitHub <strong>and</strong> you will have to submit the git commit hash &#40;SHA&#41; on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a> in the related <em>git commit hash &#40;SHA&#41;</em> submission activity.</p> <h3 id=submission_for_jupyter_hub_to_moodle ><a href="#submission_for_jupyter_hub_to_moodle" class=header-anchor >Submission for Jupyter Hub to Moodle</a></h3> <ul> <li><p>on the Hub place all notebooks of an assignment into one folder called <code>assignments/lectureX_homework</code></p> <ul> <li><p>note: maybe this folder magically already exists on your Hub with the notebooks added. If not, create it and download the notebooks yourself.</p> </ul> <li><p>in Moodle during submission, select that folder as JupyterHub submission</p> </ul> <h3 id=private_github_repository_setup ><a href="#private_github_repository_setup" class=header-anchor >Private GitHub repository setup</a></h3> <p>Once you have your GitHub account ready &#40;see lecture 2 <a href="/lecture2/#a_brief_git_demo_session">how-to</a>&#41;, create a private repository you will <em><strong>share with the teaching staff only</strong></em> to upload your weekly assignments:</p> <ol> <li><p>Create a <strong>private</strong> GitHub repository named <code>pde-on-gpu-&lt;moodleprofilename&gt;</code>, where <code>&lt;moodleprofilename&gt;</code> has to be replaced by your name <strong>as displayed on Moodle, lowercase, diacritics removed, spacing replaced with hyphens &#40;-&#41;</strong>. For example, if your Moodle profile name is &quot;Joël Désirée van der Linde&quot; your repository should be named <code>pde-on-gpu-joel-desiree-van-der-linde</code>.</p> <li><p>Select an <code>MIT License</code> and add a <code>README.md</code> file.</p> <li><p>Share this private repository on GitHub with the <a href="https://github.com/teaching-bot">teaching bot</a>.</p> <li><p><strong>For each homework submission</strong>, you will:</p> <ul> <li><p>create a git branch named <code>homework-X</code> &#40;X <span class=katex ><span class=katex-mathml ><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∈</mo><mo stretchy=false >[</mo><mn>2</mn><mo>−</mo><mi mathvariant=normal >.</mi><mi mathvariant=normal >.</mi><mi mathvariant=normal >.</mi><mo stretchy=false >]</mo></mrow><annotation encoding="application/x-tex">\in [2-...]</annotation></semantics></math></span><span class=katex-html  aria-hidden=true ><span class=base ><span class=strut  style="height:0.5782em;vertical-align:-0.0391em;"></span><span class=mrel >∈</span><span class=mspace  style="margin-right:0.2777777777777778em;"></span></span><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mopen >[</span><span class=mord >2</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span><span class=mbin >−</span><span class=mspace  style="margin-right:0.2222222222222222em;"></span></span><span class=base ><span class=strut  style="height:1em;vertical-align:-0.25em;"></span><span class=mord >.</span><span class=mord >.</span><span class=mord >.</span><span class=mclose >]</span></span></span></span>&#41; and switch to that branch &#40;<code>git switch -c homework-X</code>&#41;;</p> <li><p>create a new folder named <code>homework-X</code> to put the exercise codes into;</p> <li><p>&#40;don&#39;t forget to <code>git add</code> the code-files and <code>git commit</code> them&#41;;</p> <li><p>push to GitHub and open a pull request &#40;PR&#41; on the <code>main</code> branch on GitHub;</p> <li><p>copy <strong>the single git commit hash &#40;SHA&#41; after the final push and the link to the PR</strong> and submit <strong>both</strong> on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a> as the assignment hand-in &#40;it will serve to control the material was pushed on time&#41;;</p> <li><p>&#40;do not merge the PR yet&#41;.</p> </ul> </ol> <div class=warning ><div class=title >⚠️ Warning&#33;</div> <div class=messg >Make sure to only include the <code>homework-X</code> folders and <code>README.md</code> in the GitHub repo you share with the exercise bot in order to keep the repository lightweight.</div></div> <div class=note ><div class=title >💡 Note</div> <div class=messg >For homework 3 and later, the respective folders on GitHub should be Julia projects and thus must contain a <code>Project.toml</code> file. The <code>Manifest.toml</code> file should be excluded from version control. To do so, add it as entry to a <code>.gitignore</code> file in the root of your repo. Mac users may also add <code>.DS_Store</code> to their <a href="https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer">global <code>.gitignore</code></a>. Codes could be placed in a <code>scripts/</code> folder. Output material to be displayed in the <code>README.md</code> could be placed in a <code>docs/</code> folder.</div></div> <h3 id=feedback ><a href="#feedback" class=header-anchor >Feedback</a></h3> <p>After the submission deadline, we will correct and grade your assignments. You will get personal feedback directly on the PR as well as on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a>. Once you got feedback, please merge the PR.</p> <p>We will try to correct your assignments before the lecture following the homework&#39;s deadline. This should allow you to get rapid feedback in order to clarify the points you may struggle on as soon as possible.</p> <h2 id=project ><a href="#project" class=header-anchor >Project</a></h2> <p>Starting from lecture 7, and until lecture 9, homework assigments contribute to the course&#39;s first project. The goal of this project is to have a multi-xPU thermal porous convection solver in 3D.</p> <p>🚧 More infos to come in due time.</p> <h2 id=evaluation ><a href="#evaluation" class=header-anchor >Evaluation</a></h2> <p>All homework assigments can be done alone or in groups of two.</p> <p>Enrolled ETHZ students will have to hand in on <a href="https://moodle-app2.let.ethz.ch/course/view.php?id&#61;23586">Moodle</a> and GitHub:</p> <ol> <li><p>Six weekly assignments during the course&#39;s Part 1 and Part 2 constitute 30&#37; of the final grade. <strong>The best five out of six homeworks will be counted</strong>.</p> <li><p>A project developed during Part 3 of the course consitutes 35&#37; of the final grade</p> <li><p>A final project consitutes 35&#37; of the final grade</p> </ol> <p><strong>Project submission includes code in a Github repository and an automatically generated documentation</strong>.</p> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/package-lock.json b/package-lock.json
index 6354e2de..803cadc6 100644
--- a/package-lock.json
+++ b/package-lock.json
@@ -209,22 +209,22 @@
       }
     },
     "node_modules/parse5": {
-      "version": "7.1.2",
-      "resolved": "https://registry.npmjs.org/parse5/-/parse5-7.1.2.tgz",
-      "integrity": "sha512-Czj1WaSVpaoj0wbhMzLmWD69anp2WH7FXMB9n1Sy8/ZFF9jolSQVMu1Ij5WIyGmcBmhk7EOndpO4mIpihVqAXw==",
+      "version": "7.2.0",
+      "resolved": "https://registry.npmjs.org/parse5/-/parse5-7.2.0.tgz",
+      "integrity": "sha512-ZkDsAOcxsUMZ4Lz5fVciOehNcJ+Gb8gTzcA4yl3wnc273BAybYWrQ+Ks/OjCjSEpjvQkDSeZbybK9qj2VHHdGA==",
       "dependencies": {
-        "entities": "^4.4.0"
+        "entities": "^4.5.0"
       },
       "funding": {
         "url": "https://github.com/inikulin/parse5?sponsor=1"
       }
     },
     "node_modules/parse5-htmlparser2-tree-adapter": {
-      "version": "7.0.0",
-      "resolved": "https://registry.npmjs.org/parse5-htmlparser2-tree-adapter/-/parse5-htmlparser2-tree-adapter-7.0.0.tgz",
-      "integrity": "sha512-B77tOZrqqfUfnVcOrUvfdLbz4pu4RopLD/4vmu3HUPswwTA8OH0EMW9BlWR2B0RCoiZRAHEUu7IxeP1Pd1UU+g==",
+      "version": "7.1.0",
+      "resolved": "https://registry.npmjs.org/parse5-htmlparser2-tree-adapter/-/parse5-htmlparser2-tree-adapter-7.1.0.tgz",
+      "integrity": "sha512-ruw5xyKs6lrpo9x9rCZqZZnIUntICjQAd0Wsmp396Ul9lN/h+ifgVV1x1gZHi8euej6wTfpqX8j+BFQxF0NS/g==",
       "dependencies": {
-        "domhandler": "^5.0.2",
+        "domhandler": "^5.0.3",
         "parse5": "^7.0.0"
       },
       "funding": {
@@ -248,9 +248,9 @@
       "integrity": "sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg=="
     },
     "node_modules/undici": {
-      "version": "6.19.8",
-      "resolved": "https://registry.npmjs.org/undici/-/undici-6.19.8.tgz",
-      "integrity": "sha512-U8uCCl2x9TK3WANvmBavymRzxbfFYG+tAu+fgx3zxQy3qdagQqBLwJVrdyO1TBfUXvfKveMKJZhpvUYoOjM+4g==",
+      "version": "6.20.1",
+      "resolved": "https://registry.npmjs.org/undici/-/undici-6.20.1.tgz",
+      "integrity": "sha512-AjQF1QsmqfJys+LXfGTNum+qw4S88CojRInG/6t31W/1fk6G59s92bnAvGz5Cmur+kQv2SURXEvvudLmbrE8QA==",
       "engines": {
         "node": ">=18.17"
       }
diff --git a/search/index.html b/search/index.html
index de7e2095..4d0e7c02 100644
--- a/search/index.html
+++ b/search/index.html
@@ -1 +1 @@
-<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Search ⋅ YourWebsite</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content ><h2 id=search ><a href="#search" class=header-anchor >Search</a></h2> <p>Number of results found: <span id=resultCount ></span></p> <div id=searchResults ></div> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
+<!doctype html> <html lang=en > <meta charset=UTF-8 > <meta name=viewport  content="width=device-width, initial-scale=1"> <script src="/libs/lunr/lunr.min.js"></script> <script src="/libs/lunr/lunr_index.js"></script> <script src="/libs/lunr/lunrclient.min.js"></script> <link rel=stylesheet  href="/css/franklin.css"> <link rel=stylesheet  href="/css/poole_hyde.css"> <link rel=stylesheet  href="/css/custom.css"> <style> html {font-size: 17px;} .franklin-content {position: relative; padding-left: 8%; padding-right: 5%; line-height: 1.35em;} @media (min-width: 940px) { .franklin-content {width: 100%; margin-left: auto; margin-right: auto;} } @media (max-width: 768px) { .franklin-content {padding-left: 6%; padding-right: 6%;} } </style> <link rel=icon  href="/assets/favicon.png"> <title>Search ⋅ YourWebsite</title> <style> .content {max-width: 50rem} </style> <div class=sidebar > <div class="container sidebar-sticky"> <div class=sidebar-about > <img src="/assets/vaw_logo.png" style="width: 180px; height: auto; display: inline"> <div style="font-weight: margin-bottom: 0.5em"><a href="/"> Fall 2024</a> <span style="opacity: 0.7;">| <a href="https://www.vorlesungen.ethz.ch/Vorlesungsverzeichnis/lerneinheit.view?semkez=2024W&ansicht=KATALOGDATEN&lerneinheitId=182481&lang=en"> ETHZ 101-0250-00</a></span></div> <br> <h1><a href="/">Solving partial differential equations in parallel on GPUs</a></h1> <div style="line-height:18px; font-size: 15px; opacity: 0.85">by &nbsp; <a href="https://vaw.ethz.ch/en/people/person-detail.MjcwOTYw.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ludovic Räss</a>, &nbsp; <a href="https://vaw.ethz.ch/en/personen/person-detail.html?persid=124402">Mauro Werder</a>, &nbsp; <a href="https://www.cscs.ch/about/staff/">Samuel Omlin</a> & <br> <a href="https://vaw.ethz.ch/en/people/person-detail.MzAwMjIy.TGlzdC8xOTYxLDE1MTczNjI1ODA=.html">Ivan Utkin</a> </div> </div> <br> <style> </style> <nav class=sidebar-nav  style="opacity: 0.9; margin-bottom: 1.2cm;"> <a class="sidebar-nav-item " href="/"><b>Welcome</b></a> <a class="sidebar-nav-item " href="/logistics/">Logistics</a> <a class="sidebar-nav-item " href="/homework/">Homeworks</a> <a class="sidebar-nav-item " href="/software_install/">Software install</a> <a class="sidebar-nav-item " href="/extras/">Extras</a> <br> <div class=course-section >Part 1 – Introduction</div> <a class="sidebar-nav-item " href="/lecture1/">Lecture 1 – Why Julia GPU</a> <a class="sidebar-nav-item " href="/lecture2/">Lecture 2 – PDEs & physical processes</a> <a class="sidebar-nav-item " href="/lecture3/">Lecture 3 – Solving elliptic PDEs</a> <div class=course-section >Part 2 – Solving PDEs on GPUs</div> <a class="sidebar-nav-item " href="/lecture4/">Lecture 4 – Porous convection</a> <a class="sidebar-nav-item " href="/lecture5/">Lecture 5 – Parallel computing</a> <a class="sidebar-nav-item " href="/lecture6/">Lecture 6 – GPU computing</a> <div class=course-section >Part 3 – Multi-GPU computing (projects)</div> <a class="sidebar-nav-item " href="/lecture7/">Lecture 7 – xPU computing</a> <a class="sidebar-nav-item " href="/lecture8/">Lecture 8 – Julia MPI & multi-xPU</a> <a class="sidebar-nav-item " href="/lecture9/">Lecture 9 – Multi-xPU & Projects</a> <a class="sidebar-nav-item " href="/lecture10/">Lecture 10 – Advanced optimisations</a> <div class=course-section >Final Projects</div> <a class="sidebar-nav-item " href="/final_proj/">Infos about final projects</a> </nav> <form id=lunrSearchForm  name=lunrSearchForm > <input class=search-input  name=q  placeholder="Enter search term" type=text > <input type=submit  value=Search  formaction="/search/index.html"> </form> <br> <br> </div> </div> <div class="content container"> <div class=franklin-content ><h2 id=search ><a href="#search" class=header-anchor >Search</a></h2> <p>Number of results found: <span id=resultCount ></span></p> <div id=searchResults ></div> <div class=page-foot > <div class=copyright > <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br> Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>. </div> </div> </div> </div>
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index 5c8e3aee..b732848e 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -3,103 +3,103 @@
 
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture6/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture4/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture2/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture8/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/software_install/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/logistics/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture3/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture10/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/homework/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture1/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/search/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/final_proj/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/extras/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture5/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture7/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/lecture9/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
 <url>
     <loc>https://pde-on-gpu.vaw.ethz.ch/index.html</loc>
-    <lastmod>2024-10-08</lastmod>
+    <lastmod>2024-10-15</lastmod>
     <changefreq>monthly</changefreq>
     <priority>0.5</priority>
 </url>
diff --git a/software_install/index.html b/software_install/index.html
index df52a33d..5b294df9 100644
--- a/software_install/index.html
+++ b/software_install/index.html
@@ -103,7 +103,7 @@ <h3 id=julia_on_gpus ><a href="#julia_on_gpus" class=header-anchor >Julia on GPU
 <div class=page-foot >
   <div class=copyright >
     <a href="https://github.com/eth-vaw-glaciology/course-101-0250-00/"><b>Edit this page on <img class=github-logo  src="https://unpkg.com/ionicons@5.1.2/dist/svg/logo-github.svg"></b></a><br>
-    Last modified: October 08, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
+    Last modified: October 15, 2024. Website built with <a href="https://github.com/tlienart/Franklin.jl">Franklin.jl</a> and the <a href="https://julialang.org">Julia programming language</a>.
   </div>
 </div>
 </div>

Assignment	Due date	Submission	Notes
Lect. 1	25.09.2024 - 23h59 CET	Moodle	Submit a folder containing all exercise notebooks from JupyterHub.
Lect. 2	02.10.2024 - 23h59 CET	Moodle (notebooks), Moodle (commit hash + PR)	For the notebooks submission, submit a folder containing all exercise notebooks from JupyterHub. For the commit hash + PR submission, copy the git commit hash (SHA) of the final push on the branch `homework-2` and open a pull request on the `main` branch. Paste both the commit hash and the PR link on Moodle (check Logistics for more details on how to set up the GitHub repository).
Lect. 3	11.10.2024 - 23h59 CET	Moodle (commit hash + PR)	For the submission, copy the git commit hash (SHA) of the final push on the branch `homework-3` and open a pull request on the `main` branch. Paste both the SHA and the PR link on Moodle.
Lect. 4	18.10.2024 - 23h59 CET	Moodle (commit hash + PR)	For the submission, copy the git commit hash (SHA) of the final push on the branch `homework-4` and open a pull request on the `main` branch. Paste both the SHA and the PR link on Moodle.
Device	TFLOP/s (FP64)	Memory BW TB/s	Imbalance (FP64)
Tesla A100	9.7	1.55	9.7 / 1.55 × 8 = 50
AMD EPYC 7282	0.7	0.085	0.7 / 0.085 × 8 = 66