Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCU Compile error #5806

Open
16 tasks
Cstandardlib opened this issue Jan 3, 2025 · 3 comments
Open
16 tasks

DCU Compile error #5806

Cstandardlib opened this issue Jan 3, 2025 · 3 comments
Labels
Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS GPU & DCU & HPC GPU and DCU and HPC related any issues

Comments

@Cstandardlib
Copy link
Collaborator

Describe the bug

When compiling abacus with ROCM, errors are encounterd:

c++: error: unrecognized command line option ‘--offload-arch=gfx906’; did you mean ‘--offload-abi=ilp32’?

And there are warnings about hipsolver
included in abacus-develop/source/module_base/module_container/base/macros/rocm.h:

In file included from /public/software/compiler/dtk/24.04.2/include/hipsolver/internal/hipsolver-types.h:20,
                 from /public/software/compiler/dtk/24.04.2/include/hipsolver/hipsolver.h:20,
                 from /work/home/abacus-develop/source/module_base/module_container/base/macros/rocm.h:7,
/public/software/compiler/dtk/24.04.2/include/hipblas.h:16:161: note: #pragma message: : warning : This file is deprecated. Use the header file from /opt/dtk-24.04.2/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>
 #pragma message(": warning : This file is deprecated. Use the header file from /opt/dtk-24.04.2/include/hipblas/hipblas.h by using #include <hipblas/hipblas.h>")

Expected behavior

Latest ABACUS should compile on this ROCM environment.

To Reproduce

No response

Environment

Environment

module list
Currently Loaded Modulefiles:
 1) mpi/hpcx/2.12.0/gcc-8.3.1   2) compiler/dtk/24.04.2   3) compiler/cmake/3.23.3

Compile & build

  • Compiler: gcc-8.3.1
PACKAGES=/work/home/packages
cmake -B build -DBUILD_TESTING=ON -DCMAKE_INSTALL_PREFIX=~/.local/ \
        -DUSE_OPENMP=ON -DENABLE_LCAO=OFF \
        -DFFTW3_DIR=${PACKAGES}/fftw-3.3.10/build/ \
        -DLAPACK_DIR=${PACKAGES}/OpenBLAS-0.3.28/build/lib \
        -DSCALAPACK_DIR=${PACKAGES}/scalapack-2.2.0/ \
        -DUSE_ROCM=ON
cmake --build build -j`nproc`

Dependencies:

  • fftw-3.3.10
  • OpenBLAS-0.3.28
  • scalapack-2.2.0

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
@Cstandardlib Cstandardlib added Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS GPU & DCU & HPC GPU and DCU and HPC related any issues labels Jan 3, 2025
@Cstandardlib
Copy link
Collaborator Author

Error only occurs when TESTING is ON. If TESTING is OFF, it will compile normally.
2 errors:

  1. first
abacus-develop/source/module_elecstate/test/elecstate_print_test.cpp:43:1: error: use of undeclared identifier 'InfoNonlocal'
InfoNonlocal::~InfoNonlocal(){}
^
abacus-develop/source/module_elecstate/test/elecstate_base_test.cpp:42:1: error: use of undeclared identifier 'InfoNonlocal'
InfoNonlocal::~InfoNonlocal(){}
^
2 errors generated when compiling for gfx928.
2 errors generated when compiling for gfx926.
2 errors generated when compiling for gfx906.
gmake[2]: *** [source/module_elecstate/test/CMakeFiles/elecstate_print.dir/build.make:76: source/module_elecstate/test/CMakeFiles/elecstate_print.dir/elecstate_print_test.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
  1. second
/usr/bin/ld: ../../../libcontainer_rocm.a(container_rocm_generated_lapack.hip.cu.o): in function `container::kernels::lapack_trtri<float, container::DEVICE_GPU>::operator()(char const&, char const&, int const&, float*, int const&)':
lapack.hip.cu:(.text._ZN9container7kernels12lapack_trtriIfNS_10DEVICE_GPUEEclERKcS5_RKiPfS7_[_ZN9container7kernels12lapack_trtriIfNS_10DEVICE_GPUEEclERKcS5_RKiPfS7_]+0x82): undefined reference to `container::kernels::lapack_trtri<float, container::DEVICE_CPU>::operator()(char const&, char const&, int const&, float*, int const&)'

@Cstandardlib
Copy link
Collaborator Author

DCU version uses CPU lapack_trtri and the CMakeLists in source/module_base/module_container/ATen/ops/test/CMakeLists.txt does not include this dependency.

template <typename T>
struct lapack_trtri<T, DEVICE_GPU> {
    void operator()(
        const char& uplo,
        const char& diag,
        const int& dim,
        T* Mat,
        const int& lda) 
    {
        // TODO: trtri is not implemented in this method yet
        // Cause the trtri in cuSolver is not stable for ABACUS!
        // hipSolverConnector::trtri(hipsolver_handle, uplo, diag, dim, Mat, lda);
        // hipSolverConnector::potri(hipsolver_handle, uplo, diag, dim, Mat, lda);
        std::vector<T> H_Mat(dim * dim, static_cast<T>(0.0));
        hipMemcpy(H_Mat.data(), Mat, sizeof(T) * H_Mat.size(), hipMemcpyDeviceToHost);
        lapack_trtri<T, DEVICE_CPU>()(uplo, diag, dim, H_Mat.data(), lda);
        hipMemcpy(Mat, H_Mat.data(), sizeof(T) * H_Mat.size(), hipMemcpyHostToDevice);
    }
};

I will add lapack.cpp in source/module_base/module_container/ATen/ops/test/CMakeLists.txt .

@Cstandardlib
Copy link
Collaborator Author

Dependency failure in module_elecstate tests:

/usr/bin/ld: /usr/bin/ld: DWARF error: invalid or unhandled FORM value: 0x25
CMakeFiles/elecstate_fp_energy.dir/__/elecstate_energy_terms.cpp.o: in function `elecstate::ElecState::get_hartree_energy()':
elecstate_energy_terms.cpp:(.text+0x4): undefined reference to `elecstate::H_Hartree_pw::hartree_energy'
/usr/bin/ld: CMakeFiles/elecstate_fp_energy.dir/__/elecstate_energy_terms.cpp.o: in function `elecstate::ElecState::get_etot_efield()':
elecstate_energy_terms.cpp:(.text+0x14): undefined reference to `elecstate::Efield::etotefield'
/usr/bin/ld: CMakeFiles/elecstate_fp_energy.dir/__/elecstate_energy_terms.cpp.o: in function `elecstate::ElecState::get_etot_gatefield()':
elecstate_energy_terms.cpp:(.text+0x24): undefined reference to `elecstate::Gatefield::etotgatefield'
/usr/bin/ld: CMakeFiles/elecstate_fp_energy.dir/__/elecstate_energy_terms.cpp.o: in function `elecstate::ElecState::get_solvent_model_Ael()':
elecstate_energy_terms.cpp:(.text+0x34): undefined reference to `surchem::Ael'
/usr/bin/ld: CMakeFiles/elecstate_fp_energy.dir/__/elecstate_energy_terms.cpp.o: in function `elecstate::ElecState::get_solvent_model_Acav()':
elecstate_energy_terms.cpp:(.text+0x44): undefined reference to `surchem::Acav'
/usr/bin/ld: CMakeFiles/elecstate_fp_energy.dir/__/elecstate_energy_terms.cpp.o: in function `elecstate::ElecState::get_dftu_energy()':
elecstate_energy_terms.cpp:(.text+0x54): undefined reference to `GlobalC::dftu'
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [source/module_elecstate/test/CMakeFiles/elecstate_fp_energy.dir/build.make:287: source/module_elecstate/test/elecstate_fp_energy] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:5723: source/module_elecstate/test/CMakeFiles/elecstate_fp_energy.dir/all] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compile & CICD & Docs & Dependencies Issues related to compiling ABACUS GPU & DCU & HPC GPU and DCU and HPC related any issues
Projects
None yet
Development

No branches or pull requests

1 participant