This repository has been archived by the owner on Dec 16, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtest_run_send_recv_mpi.log
63 lines (62 loc) · 4.3 KB
/
test_run_send_recv_mpi.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/# mpirun --allow-run-as-root -np 2 -H <rocm_ip>,<cuda_ip> -mca pml ucx -mca coll_ucc_enable 1 -mca coll_ucc_priority 100 -mca coll_ucc_verbose 3 -mca pml_ucx_verbose 3 /test_send_recv
[ip-cuda:00302] common_ucx.c:183 using OPAL memory hooks as external events
[ip-cuda:00302] pml_ucx.c:207 mca_pml_ucx_open: UCX version 1.18.0
[ip-rocm:01363] common_ucx.c:183 using OPAL memory hooks as external events
[ip-rocm:01363] pml_ucx.c:207 mca_pml_ucx_open: UCX version 1.18.0
[ip-rocm:01363] common_ucx.c:342 self/memory: did not match transport list
[ip-rocm:01363] common_ucx.c:342 tcp/ens5: did not match transport list
[ip-rocm:01363] common_ucx.c:342 tcp/lo: did not match transport list
[ip-rocm:01363] common_ucx.c:342 sysv/memory: did not match transport list
[ip-rocm:01363] common_ucx.c:342 posix/memory: did not match transport list
[ip-rocm:01363] common_ucx.c:342 rocm_copy/rocm_cpy: did not match transport list
[ip-rocm:01363] common_ucx.c:227 readlink(/sys/class/infiniband/rocm_ipc/device/driver) failed: No such file or directory
[ip-rocm:01363] common_ucx.c:337 rocm_ipc/rocm_ipc: matched transport list but not device list
[ip-rocm:01363] common_ucx.c:342 cma/memory: did not match transport list
[ip-rocm:01363] common_ucx.c:347 support level is transports only
[ip-rocm:01363] pml_ucx.c:293 mca_pml_ucx_init
[ip-rocm:01363] pml_ucx.c:124 Pack remote worker address, size 82
[ip-rocm:01363] pml_ucx.c:124 Pack local worker address, size 302
[ip-rocm:01363] pml_ucx.c:358 created ucp context 0x902fe0, worker 0x93fe60
[ip-rocm:01363] pml_ucx_component.c:147 returning priority 19
[ip-cuda:00302] common_ucx.c:342 self/memory: did not match transport list
[ip-cuda:00302] common_ucx.c:342 tcp/ens5: did not match transport list
[ip-cuda:00302] common_ucx.c:342 tcp/lo: did not match transport list
[ip-cuda:00302] common_ucx.c:342 sysv/memory: did not match transport list
[ip-cuda:00302] common_ucx.c:342 posix/memory: did not match transport list
[ip-cuda:00302] common_ucx.c:342 cuda_copy/cuda: did not match transport list
[ip-cuda:00302] common_ucx.c:227 readlink(/sys/class/infiniband/cuda/device/driver) failed: No such file or directory
[ip-cuda:00302] common_ucx.c:337 cuda_ipc/cuda: matched transport list but not device list
[ip-cuda:00302] common_ucx.c:342 cma/memory: did not match transport list
[ip-cuda:00302] common_ucx.c:347 support level is transports only
[ip-cuda:00302] pml_ucx.c:293 mca_pml_ucx_init
[ip-cuda:00302] pml_ucx.c:124 Pack remote worker address, size 82
[ip-cuda:00302] pml_ucx.c:124 Pack local worker address, size 300
[ip-cuda:00302] pml_ucx.c:358 created ucp context 0x5f893ad21a00, worker 0x5f893ad99a90
[ip-cuda:00302] pml_ucx_component.c:147 returning priority 19
[ip-cuda:00302] pml_ucx.c:192 Got proc 1 address, size 300
[ip-cuda:00302] pml_ucx.c:423 connecting to proc. 1
[ip-rocm:01363] pml_ucx.c:192 Got proc 0 address, size 302
[ip-rocm:01363] pml_ucx.c:423 connecting to proc. 0
[ip-rocm:01363] pml_ucx.c:192 Got proc 1 address, size 82
[ip-rocm:01363] pml_ucx.c:423 connecting to proc. 1
[ip-cuda:00302] pml_ucx.c:192 Got proc 0 address, size 82
[ip-cuda:00302] pml_ucx.c:423 connecting to proc. 0
[ip-cuda:00302] coll_ucc_module.c:383 - mca_coll_ucc_init_ctx() initialized ucc context
[ip-cuda:00302] coll_ucc_module.c:469 - mca_coll_ucc_module_enable() creating ucc_team for comm 0x5f893a93b080, comm_id 0, comm_size 2
[ip-rocm:01363] coll_ucc_module.c:383 - mca_coll_ucc_init_ctx() initialized ucc context
[ip-rocm:01363] coll_ucc_module.c:469 - mca_coll_ucc_module_enable() creating ucc_team for comm 0x2048e0, comm_id 0, comm_size 2
Rank 0 sent data to Rank 1.
Rank 1 received data from Rank 0.
Received data: 0 1 2 3 4 5 6 7 8 9 ...
[ip-cuda:00302] common_ucx.c:479 disconnecting from rank 0
[ip-cuda:00302] common_ucx.c:444 waiting for 1 disconnect requests
[ip-rocm:01363] common_ucx.c:479 disconnecting from rank 0
[ip-rocm:01363] common_ucx.c:479 disconnecting from rank 1
[ip-rocm:01363] common_ucx.c:444 waiting for 1 disconnect requests
[ip-cuda:00302] common_ucx.c:479 disconnecting from rank 1
[ip-cuda:00302] common_ucx.c:444 waiting for 0 disconnect requests
[ip-rocm:01363] common_ucx.c:444 waiting for 0 disconnect requests
[ip-cuda:00302] pml_ucx.c:374 mca_pml_ucx_cleanup
[ip-rocm:01363] pml_ucx.c:374 mca_pml_ucx_cleanup
[ip-cuda:00302] pml_ucx.c:277 mca_pml_ucx_close
[ip-rocm:01363] pml_ucx.c:277 mca_pml_ucx_close