Is setting IBGDA necessary for test_internode.py? #36

zhangml · 2025-03-02T14:28:15Z

I noticed the following steps in the guide:

Enable IBGDA by modifying /etc/modprobe.d/nvidia.conf:
options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"
Update kernel configuration:
  sudo update-initramfs -u
  sudo reboot

Due to some environment permission issues, I can't do this step for now. Is it possible to run test_internode.py without doing this step?

The text was updated successfully, but these errors were encountered:

LyricZhao · 2025-03-03T02:39:50Z

Yes, you can. The normal kernels use IBRC instead of IBGDA. But we plan to support AR later, which always requires IBGDA.

zhangml · 2025-03-03T06:46:01Z

Yes, you can. The normal kernels use IBRC instead of IBGDA. But we plan to support AR later, which always requires IBGDA.

I ran 2 H20 node test_internode.py without IBGDA and encountered the following error, I'm not sure if it's related to IBGDA. @LyricZhao

nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:499: non-zero status: -3 nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:bootstrap_net_recv:99: Message truncated : received 40 bytes instead of 1

nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:499: non-zero status: -3 nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:bootstrap_net_recv:99: Message truncated : received 127 bytes instead of 8

nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:499: non-zero status: -3 nvshmem_src/src/host/topo/topo.cpp:477: non-zero status: -3 allgather of ipc handles failed 

nvshmem_src/src/host/init/init.cu:992: non-zero status: 7 building transport map failed

haswelliris · 2025-03-03T07:31:42Z

Based on your logs, it appears that the system is unable to retrieve information from other ranks during bootstrap. We recommend checking your network connectivity settings, including:
- Proper IP and network interface configuration (NVSHMEM_HCA_LIST)
- For RoCE, ensure correct settings for:
  - NVSHMEM_IB_GID_INDEX
  - NVSHMEM_IB_TRAFFIC_CLASS
We strongly recommend properly enabling IBGDA usage to prevent potential unknown issues.

yanminjia · 2025-03-03T07:44:49Z

I would like to run test_internode.py with IBGDA enabled because it looks dual-port RNIC is supported by going with IBGDA over RoCE network. Could you please give me a detailed configuration to enable IBGDA?

Many thanks.

Baibaifan · 2025-03-03T08:40:55Z

I used megatron-lm to test two H100s and 16 cards in RoCE network with ep=16. I also encountered the above bootstrap_net_recv:99: Message truncated: received 40 bytes instead of 8. I set IBGDA, but it prompts: WARN: init failed for remote transport: ibrc.

sphish · 2025-03-03T08:57:42Z

I would like to run test_internode.py with IBGDA enabled because it looks dual-port RNIC is supported by going with IBGDA over RoCE network. Could you please give me a detailed configuration to enable IBGDA?

Many thanks.

You can set envrionments

NVSHMEM_IB_ENABLE_IBGDA=1
NVSHMEM_IBGDA_NIC_HANDLER=gpu

to enable IBGDA.

sphish · 2025-03-03T09:02:38Z

I used megatron-lm to test two H100s and 16 cards in RoCE network with ep=16. I also encountered the above bootstrap_net_recv:99: Message truncated: received 40 bytes instead of 8. I set IBGDA, but it prompts: WARN: init failed for remote transport: ibrc.

This appears to be an error during NVSHMEM bootstrap. Please verify your network configuration is correct. It's recommended to run the tests in NVSHMEM perftest first to validate your network setup.

Note that even when IBGDA is enabled, NVSHMEM will still create IBRC connections, so seeing this warning message makes sense. For more details, please refer to the NVSHMEM documentation.

Baibaifan · 2025-03-03T09:18:46Z

I used megatron-lm to test two H100s and 16 cards in RoCE network with ep=16. I also encountered the above bootstrap_net_recv:99: Message truncated: received 40 bytes instead of 8. I set IBGDA, but it prompts: WARN: init failed for remote transport: ibrc.

This appears to be an error during NVSHMEM bootstrap. Please verify your network configuration is correct. It's recommended to run the tests in NVSHMEM perftest first to validate your network setup.

Note that even when IBGDA is enabled, NVSHMEM will still create IBRC connections, so this warning message is expected. For more details, please refer to the NVSHMEM documentation.

My script:

NCCL_DEBUG=INFO MASTER_ADDR=xxx WORLD_SIZE=2 RANK=0 python tests/test_internode.py
NCCL_DEBUG=INFO MASTER_ADDR=xxx WORLD_SIZE=2 RANK=1 python tests/test_internode.py

RANK=0 result:

nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:bootstrap_net_recv:99: Message truncated : received 40 bytes instead of 8

nvshmem_src/src/modules/bootstrap/uid/bootstrap_uid.cpp:499: non-zero status: -3 nvshmem_src/src/host/topo/topo.cpp:477: non-zero status: -3 allgather of ipc handles failed
...
nvshmem_src/src/host/init/init.cu:992: non-zero status: 7 building transport map failed

nvshmem_src/src/host/init/init.cu:nvshmemi_check_state_and_init:1074: nvshmem initialization failed, exiting

nvshmem_src/src/host/init/init.cu:992: non-zero status: 7 building transport map failed

RANK=1 result:

nvshmem_src/src/modules/transport/ibrc/ibrc.cpp:nvshmemt_init:1850: neither nv_peer_mem, or nvidia_peermem detected. Skipping transport.

nvshmem_src/src/host/topo/topo.cpp:469: [GPU 0] Peer GPU 1 is not accessible, exiting ...
nvshmem_src/src/host/init/init.cu:992: non-zero status: 3 building transport map failed
...

WARN: init failed for remote transport: ibrc

haswelliris · 2025-03-03T10:07:07Z

@Baibaifan The message neither nv_peer_mem nor nvidia_peermem detected indicates that your system environment does not currently support GPU Direct RDMA. To resolve this, please try loading the GDR kernel module by running one of the following commands:

modprobe nv_peer_mem
# or
modprobe nvidia_peermem

This should enable GPU Direct RDMA functionality.

Baibaifan · 2025-03-03T10:24:37Z

modprobe nvidia_peermem

@haswelliris After I successfully set modprobe nv_peer_mem, repeating the above command appears:

rank0:
There is no error output for rank0.

rank1:

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x11000008)
==== backtrace (tid:  84663) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x0000000000010a13 process_recv()  :0
 2 0x00000000000112e5 progress_recv()  :0
 3 0x00000000000113dc nvshmemt_ibrc_progress()  :0
 4 0x000000000020256c progress_transports()  ???:0
 5 0x0000000000202c52 nvshmemi_proxy_progress()  ???:0
 6 0x0000000000094ac3 pthread_condattr_setpshared()  ???:0
 7 0x0000000000126850 __xmknodat()  ???:0
=================================

yanminjia · 2025-03-03T10:39:05Z

I would like to run test_internode.py with IBGDA enabled because it looks dual-port RNIC is supported by going with IBGDA over RoCE network. Could you please give me a detailed configuration to enable IBGDA?
Many thanks.

You can set envrionments

NVSHMEM_IB_ENABLE_IBGDA=1
NVSHMEM_IBGDA_NIC_HANDLER=gpu
to enable IBGDA.

Many thanks, will try it.

ghghliu · 2025-03-03T11:06:59Z

Is there any way to determine if IBGDA is correctly enabled, because the performance seems no difference weather I set NVSHMEM_IB_ENABLE_IBGDA=1 or 0, and the result looks ok. And is it necessary to compile libgdsync(https://github.com/gpudirect/libgdsync) before compiling nvshmem?

yanminjia · 2025-03-03T12:37:00Z

I would like to run test_internode.py with IBGDA enabled because it looks dual-port RNIC is supported by going with IBGDA over RoCE network. Could you please give me a detailed configuration to enable IBGDA?
Many thanks.

You can set envrionments

NVSHMEM_IB_ENABLE_IBGDA=1
NVSHMEM_IBGDA_NIC_HANDLER=gpu
to enable IBGDA.

Many thanks for your kindly response. It looks not work because ibrc.cxx:progress_send(...) is called for transferring data by checking a log message added to this function (ibrc.cxx:progress_send(...)). Maybe, any other configuration missed to enable IBGDA?

yanminjia · 2025-03-03T13:41:41Z

I would like to run test_internode.py with IBGDA enabled because it looks dual-port RNIC is supported by going with IBGDA over RoCE network. Could you please give me a detailed configuration to enable IBGDA?
Many thanks.

You can set envrionments
NVSHMEM_IB_ENABLE_IBGDA=1
NVSHMEM_IBGDA_NIC_HANDLER=gpu
to enable IBGDA.

Many thanks for your kindly response. It looks not work because ibrc.cxx:progress_send(...) is called for transferring data by checking a log message added to this function (ibrc.cxx:progress_send(...)). Maybe, any other configuration missed to enable IBGDA?

By debugging this issue, ibgda.cc:nvshmemt_init(...) is failed with the error message as following:

WARN: device mlx5_1 cannot allocate buffer on the specified memory type. Skipping...

This problem is caused by mlx5dv_devx_umem_reg(...) failure.

Any suggestion would be appreciated.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is setting IBGDA necessary for test_internode.py? #36

Is setting IBGDA necessary for test_internode.py? #36

zhangml commented Mar 2, 2025

LyricZhao commented Mar 3, 2025

zhangml commented Mar 3, 2025 •

edited

Loading

haswelliris commented Mar 3, 2025

yanminjia commented Mar 3, 2025 •

edited

Loading

Baibaifan commented Mar 3, 2025

sphish commented Mar 3, 2025

sphish commented Mar 3, 2025 •

edited

Loading

Baibaifan commented Mar 3, 2025 •

edited

Loading

haswelliris commented Mar 3, 2025 •

edited

Loading

Baibaifan commented Mar 3, 2025 •

edited

Loading

yanminjia commented Mar 3, 2025

ghghliu commented Mar 3, 2025

yanminjia commented Mar 3, 2025

yanminjia commented Mar 3, 2025

Is setting IBGDA necessary for test_internode.py? #36

Is setting IBGDA necessary for test_internode.py? #36

Comments

zhangml commented Mar 2, 2025

LyricZhao commented Mar 3, 2025

zhangml commented Mar 3, 2025 • edited Loading

haswelliris commented Mar 3, 2025

yanminjia commented Mar 3, 2025 • edited Loading

Baibaifan commented Mar 3, 2025

sphish commented Mar 3, 2025

sphish commented Mar 3, 2025 • edited Loading

Baibaifan commented Mar 3, 2025 • edited Loading

haswelliris commented Mar 3, 2025 • edited Loading

Baibaifan commented Mar 3, 2025 • edited Loading

yanminjia commented Mar 3, 2025

ghghliu commented Mar 3, 2025

yanminjia commented Mar 3, 2025

yanminjia commented Mar 3, 2025

zhangml commented Mar 3, 2025 •

edited

Loading

yanminjia commented Mar 3, 2025 •

edited

Loading

sphish commented Mar 3, 2025 •

edited

Loading

Baibaifan commented Mar 3, 2025 •

edited

Loading

haswelliris commented Mar 3, 2025 •

edited

Loading

Baibaifan commented Mar 3, 2025 •

edited

Loading