Issue: EuroCC2 Bootcamp Technical Issues #2

programmah · 2024-05-10T13:00:20Z

Lab: single-gpu overview
Output from the code differs from the example in the lab (looks like it may be off by an iteration – the lab example includes iteration 0?) e.g. for iteration 900, output is 900, 0.173963

example in lab is 900, 0.173818

Lab: intra-node topology
In the DGX a100 section, it has the following text:
“If we remove the -p2p flag and and run the command again for GPUs 0 and 7, we will not get any difference in performance on DGX A100 system. As you may recall, P2P is not possible between GPUs 0 and 7, so the underlying communication path doesn't change, resulting in same performance with and without the -p2p flag. This can be confirmed by profiling the application and looking at the operations performed in the Nsight Systems timeline.”
Two things here:
- double “and”: “ and and “ in the first sentence
- secondly (and the important one) it says that P2P is not possible for GPU 0 and 7 – this is incorrect (it is true for a DGX V100) for the DGX a100 – P2P is available between any GPU on the DGX A100 thanks to the NVSwitch
Lab: CUDA streams
Diagram of the default stream is potentially misleading – it suggests the non-default stream can execute at the same time:
-   For the Optimization: "Notice that the copy operations take place serially after the Jacobi iteration. The kernel computation must be complete before copying the updated halos from the GPU of interest (source) to its neighbours (destination). However, we can perform the copy operation from the neighbouring GPUs (source) to the GPU of interest (destination) concurrently with the kernel computation as it will only be required in the next iteration." – a diagram for this might be helpful
-  Also for the Implementation exercise part 4, a diagram tracking the sequence of events we are trying to create might be useful
- There appears to be an error in the provided code to be changed:
The final TODO has a cudaMemcpyAsync as the code to be modified - it should actually be a cudaEventRecord (the solution has correct code):
            // TODO: Part 4- Record completion of bottom halo copy from "dev_id" to its neighbour
            // to be used in next iteration. Record the event for "push_bottom_done" stream of
            // "dev_id" for next iteration which is "(iter+1) % 2"
            CUDA_RT_CALL(cudaMemcpyAsync(/Fill me/, /Fill me/, nx * sizeof(float),
                                         /Fill me/, /Fill me/));
Should be
            CUDA_RT_CALL(cudaEventRecord(/Fill me/, /Fill me/));
With solution:
           CUDA_RT_CALL(cudaEventRecord(push_bottom_done[((iter + 1) % 2)][dev_id],
                                         push_bottom_stream[dev_id]));
Lab: Multi-node Multi-GPU programming
The srun seemed to take a while compared to other labs
Lab: MPI with cuda memcpy

This and subsequent labs with MPI show a warning message every time some MPI code is executed, often printed several times (I presume once per MPI task):

"Sorry! You were supposed to get help about …"
This can be fixed by setting the environment variables as follows:
export OPAL_PREFIX=$MPI_HOME
export PMIX_MCA_psec=^munge

Section "Point-to-point communication"
- typo "differenciate" should be "differentiate"

Code: jacobi_memcpy_mpi.cpp
Typo: in the first TODO part 1 have "PI_CALL", should be "MPI_CALL"
Typo: in the first TODO part 1, has "ot" when it should be "to" in comment Section: OpenMPI Process Mappings
Typo: "spcified" should be "specified"(same for the solution version of the code)

Section: OpenMPI Process Mappings
Typo: "spcified" should be "specified"

Lab: NCCL
-    No “Lab objectives”
-   Section: "Implementation Exercise":
o   Typo "funciton" should be function
o   Typo "Similarily" should be Similarly
o   The synchronize device TODO is not mentioned in the list
-   Towards end of the notebook
o   Typo "number of rocesses" should be "number of processes"
-   Get a lot of info printed out as part of the execution - I guess it’s the NCCL_DEBUG_INFO env variable? If so, maybe mention this in the text? E.g.:
NCCL version 2.18.5+cuda12.2
dgx01:3992189:3992189 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [1]mlx5_1:1/IB [2]mlx5_2:1/IB [3]mlx5_3:1/IB [4]mlx5_4:1/IB [5]mlx5_5:1/RoCE [6]mlx5_6:1/IB [7]mlx5_7:1/IB [8]mlx5_8:1/IB [9]mlx5_9:1/IB [10]mlx5_10:1/IB [RO]; OOB ibp12s0:100.126.5.1<0>
dgx01:3992189:3992189 [0] NCCL INFO Using network IB
dgx01:3992189:3992189 [0] NCCL INFO comm 0x2be48c0 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 7000 commId 0x25ba213461d9c320 - Init START
dgx01:3992189:3992189 [0] NCCL INFO Setting affinity for GPU 0 to ffff0000,00000000,00000000,00000000,ffff0000,00000000
dgx01:3992189:3992189 [0] NCCL INFO Channel 00/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 01/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 02/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 03/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 04/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 05/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 06/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 07/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 08/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 09/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 10/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 11/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 12/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 13/24 :    0   1
dgx01:3992189:3992189 [0] NCCL INFO Channel 14/24 :    0   1
Lab: NVSHMEM
-        No “Lab objectives”
-        In section "Communication Model" - the paragraph here is word for word a copy of the paragraph used in the previous section "GPU-initiated communication"
-        Section "Memory model" - mentions "NVSHMEMAPI" - should it be two words, i.e. "NVSHMEM API"?
-        Section "Thread-group level communication" - Code example calls "get_block_offet" - offet should be "offset"
-        In section "Implementation exercise": “Alternatively, you can navigate to CFD/English/C/source_code/mpi/ directory in Jupyter's file browser in the left pane. Then, click to open the jacobi_nvshmem.cu file.” – the folder is wrong, should be /nvshmem, not /mpi

The text was updated successfully, but these errors were encountered:

programmah changed the title ~~Issue: EuroCC Bootcamp Technical Issues~~ Issue: EuroCC2 Bootcamp Technical Issues May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: EuroCC2 Bootcamp Technical Issues #2

Issue: EuroCC2 Bootcamp Technical Issues #2

programmah commented May 10, 2024

Issue: EuroCC2 Bootcamp Technical Issues #2

Issue: EuroCC2 Bootcamp Technical Issues #2

Comments

programmah commented May 10, 2024