Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

superlu_dist INDEX_SIZE=64's tests failed #132

Open
sagitter opened this issue Feb 25, 2023 · 4 comments
Open

superlu_dist INDEX_SIZE=64's tests failed #132

sagitter opened this issue Feb 25, 2023 · 4 comments

Comments

@sagitter
Copy link

Hi all.

I'm testing superlu_dist-8.1.2 with XSDK_INDEX_SIZE=64 in Fedora, the tests are all failing with an output like this (full log is build.log, test's output is at the bottom of the log):

6/17 Test  #7: pdtest_2x2_1_2_8_20_SP ...........***Failed    1.54 sec
Time to read and distribute matrix 0.00
[02a2ee191fc949f98dab9745b724f29b:28714:0:29324] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x562c780e5300)
[02a2ee191fc949f98dab9745b724f29b:28714] *** Process received signal ***
[02a2ee191fc949f98dab9745b724f29b:28714] Signal: Segmentation fault (11)
[02a2ee191fc949f98dab9745b724f29b:28714] Signal code: Address not mapped (1)
[02a2ee191fc949f98dab9745b724f29b:28714] Failing at address: 0x562c780e5300
[02a2ee191fc949f98dab9745b724f29b:28714] [ 0] [02a2ee191fc949f98dab9745b724f29b:28728:0:28728] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x5638c8013510)
==== backtrace (tid:  29324) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2ec) [0x7f1dddd0b65c]
 1  /lib64/libucs.so.0(+0x29cbd) [0x7f1dddd0ccbd]
 2  /lib64/libucs.so.0(+0x29e8d) [0x7f1dddd0ce8d]
 3  /lib64/libopenblaso64.so.0(dcopy_k_HASWELL+0x387) [0x7f1f6268f387]
=================================
/lib64/libc.so.6(+0x3dc10)[0x7f1f644e3c10]
[02a2ee191fc949f98dab9745b724f29b:28714] [ 1] ==== backtrace (tid:  28728) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2ec) [0x7f747dd1f65c]
 1  /lib64/libucs.so.0(+0x29cbd) [0x7f747dd20cbd]
 2  /lib64/libucs.so.0(+0x29e8d) [0x7f747dd20e8d]
 3  /lib64/libopenblaso64.so.0(dcopy_k_HASWELL+0x387) [0x7f760268f387]
=================================
[02a2ee191fc949f98dab9745b724f29b:28728] *** Process received signal ***
[02a2ee191fc949f98dab9745b724f29b:28728] Signal: Segmentation fault (11)
[02a2ee191fc949f98dab9745b724f29b:28728] Signal code:  (-6)
[02a2ee191fc949f98dab9745b724f29b:28728] Failing at address: 0x3e800007038
[02a2ee191fc949f98dab9745b724f29b:28728] [ 0] /lib64/libc.so.6(+0x3dc10)[0x7f76044e3c10]
[02a2ee191fc949f98dab9745b724f29b:28728] [ 1] /lib64/libopenblaso64.so.0(dcopy_k_HASWELL+0x387)[0x7f760268f387]
[02a2ee191fc949f98dab9745b724f29b:28728] *** End of error message ***

Compiler: GCC-13.0.1
BLAS: flexiblas-3.3.0
scotch-6.1.2
suitesparse-5.13.0

@xiaoyeli
Copy link
Owner

Can you try the internal CBLAS/, with cmake install like:

cmake ..
-DTPL_ENABLE_INTERNAL_BLASLIB=ON

@sagitter
Copy link
Author

With the internal CBLAS all tests are passed in all architectures except pdtest_2x2_3_2_8_20_SP in PowerPC64 Little Endian architecture.

      Start  8: pdtest_2x2_3_2_8_20_SP
8: Test command: /usr/lib64/openmpi/bin/mpiexec "-n" "4" "/builddir/build/BUILD/superlu_dist-8.1.2/build/openmpi/TEST/pdtest" "-r" "2" "-c" "2" "-s" "3" "-b" "2" "-x" "8" "-m" "20" "-f" "/builddir/build/BUILD/superlu_dist-8.1.2/EXAMPLE/g20.rua"
8: Working Directory: /builddir/build/BUILD/superlu_dist-8.1.2/build/openmpi/TEST
8: Test timeout computed to be: 1500
3: --------------------------------------------------------------------------
3: A call to mkdir was unable to create the desired directory:
3: 
3:   Directory: /tmp/ompi.aad2fbab9d8c461cb4fa17d8d48c33fe.1000/pid.29640
3:   Error:     No such file or directory
3: 
3: Please check to ensure you have adequate permissions to perform
3: the desired operation.
3: --------------------------------------------------------------------------
3: [aad2fbab9d8c461cb4fa17d8d48c33fe:29640] [[51824,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 107
3: [aad2fbab9d8c461cb4fa17d8d48c33fe:29640] [[51824,0],0] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 346
3: --------------------------------------------------------------------------
3: It looks like orte_init failed for some reason; your parallel process is
3: likely to abort.  There are many reasons that a parallel process can
3: fail during orte_init; some of which are due to configuration or
3: environment problems.  This failure appears to be an internal failure;
3: here's some additional information (which may only be relevant to an
3: Open MPI developer):
3: 
3:   orte_session_dir failed
3:   --> Returned value Error (-1) instead of ORTE_SUCCESS
3: --------------------------------------------------------------------------
 1/17 Test  #3: pdtest_1x2_1_2_8_20_SP ...........***Failed    0.07 sec
--------------------------------------------------------------------------

Anyhow, internal CBLAS looks like working better than Flexiblas (build.log).

@gdmoss14
Copy link

I recently ran into a similar problem where SuperLU_DIST (v9.0.0) with 64-bit indexing builds successfully, but all tests fail. In my case I'm using MKL's BLAS. Here's what I have determined:

Indicating 64-bit indexing is desired and using the correct MKL ilp64 BLAS library does not work.
Indicating 32-bit indexing is desired and using the correct MKL lp64 BLAS library does not work.
Indicating 64-bit indexing is desired and using the incorrect MKL lp64 BLAS library does work.
Indicating 32-bit indexing is desired and using the incorrect MKL ilp64 BLAS library does not work.

This does not appear to be an issue with regular SuperLU (v6.0.1) in which I can request 64-bit indexing and use ilp64 with no problem.

@xiaoyeli
Copy link
Owner

xiaoyeli commented Sep 1, 2024

The BLAS standard for the integer inputs (e.g., dimension, etc.) are specified as 32bit integer.
So, even if you use 64bit indexing for the sparse matrix meta data structure in SuperLU, the internal BLAS calls always use 32bit integer. You always need to link with MKL lp64 BLAS, not ilp64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants