Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to test cufftMp in Perlmutter #218

Open
CunyangWei opened this issue Sep 9, 2024 · 1 comment
Open

How to test cufftMp in Perlmutter #218

CunyangWei opened this issue Sep 9, 2024 · 1 comment
Labels

Comments

@CunyangWei
Copy link

I just want to use CufftMp in Perlmutter. I try many times to compile it but it did not work.

if I use

CC -gpu=cc80 test_cufft.cu -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/include/ -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/include/cufftmp -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64 -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib -lnvshmem_device -lnvshmem_host -lnvshmem -lcufft -lcufftMp

It will report:

nvlink error   : Multiple definition of 'nvshmem_global_exit' in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem.a:init_device.cu.o', first defined in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem_device.a:init_device.cu.o'
nvlink error   : Multiple definition of 'nvshmemi_ibgda_device_state_d' in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem.a:init_device.cu.o', first defined in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem_device.a:init_device.cu.o'
nvlink error   : Multiple definition of 'nvshmemi_device_state_d' in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem.a:init_device.cu.o', first defined in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem_device.a:init_device.cu.o'
nvlink error   : Multiple definition of 'nvshmemi_device_lib_version_d' in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem.a:init_device.cu.o', first defined in '/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib/libnvshmem_device.a:init_device.cu.o'
nvlink fatal   : merge_elf failed

if I use CC -gpu=cc80 test_cufft.cu -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/include/ -I /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/include/cufftmp -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64 -L /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib -Wl,-rpath,/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/comm_libs/nvshmem/lib -lnvshmem_device -lnvshmem_host -lcufft -lcufftMp

It will report:


/usr/bin/ld: warning: /tmp/pgcudafatLk4DVCiPEwjh.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemx_internal_init_thread@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemx_internal_common_init@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemi_check_state_and_init_fn_ptr@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemi_finalize@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemi_init_counter@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemi_is_version_compatible@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemx_get_device_state@NVSHMEM'
/usr/bin/ld: /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/math_libs/lib64/libcufftMp.so: undefined reference to `nvshmemi_register_state_change_handler@NVSHMEM'
pgacclnk: child process exit status 1: /usr/bin/ld

Do any one has some experiences about it?

@nvlcambier
Copy link
Contributor

  1. Please don't mix -lnvshmem and -lnvshmem_device -lnvshmem_host. Use only the second one.
  2. I think -lcufftMp should come before -lnvshmem_device -lnvshmem_host
  3. Please don't use both -lcufft and -lcufftMp. Use only -lcufftMp. cuFFTMp has all the APIs from cuFFT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants