Skip to content

fix: windows compilation of hash maps tests and increase stack size avoiding segfaults #988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jalvesz
Copy link
Contributor

@jalvesz jalvesz commented Apr 25, 2025

This PR addresses #979 and #971 partially.

For intel compiler some dependencies with mvsc need to be managed.

Another problem is the stack memory size limit that needs to be increased for Windows.

cc @R-Goc @zoziha @jvdp1 @perazz

@jalvesz jalvesz changed the title fix windows compilation of hash maps tests and increase stack size avoiding segfaults fix: windows compilation of hash maps tests and increase stack size avoiding segfaults Apr 25, 2025
@R-Goc
Copy link

R-Goc commented Apr 25, 2025

Ideally there would be a way to avoid the issues with the libs, instead of fixing them this way, but it seems to work.

@R-Goc
Copy link

R-Goc commented Apr 26, 2025

Also this may warrant adding the intel oneapi llvm compiler on windows to the compilers which work, but aren't tested in the readme.

@jalvesz
Copy link
Contributor Author

jalvesz commented Apr 26, 2025

Yes, I agree that ideally this should be managed differently but I'm not aware of a better solution compared to what we found here. It is good that indeed, it enables full compliance on Windows for the tests.

@jalvesz jalvesz requested a review from Copilot April 26, 2025 11:47
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (2)
  • test/CMakeLists.txt: Language not supported
  • test/hash_functions/CMakeLists.txt: Language not supported

@@ -2,6 +2,14 @@ if (NOT TARGET "test-drive::test-drive")
find_package("test-drive" REQUIRED)
endif()

if(WIN32)
if(CMAKE_Fortran_COMPILER_ID MATCHES "^Intel")
add_link_options(/Qoption,link,/STACK:8388608)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked in the hashmap source file and I can't find reasons more stack size would be needed: there are no static (save)d variables or arrays there, only allocatables.

I'm worried that increasing the stack size may just be hiding other issues with the code that we haven't figured out yet. For example, the test cases include static arrays, maybe they could be replaced with allocatables?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably the arrays, but it would need figuring out. Keep in mind windows has a 1MB default, while Linux usually has 8MB. This brings it to 8MB.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases explicitly state that they may require quite absurd stack sizes. 48MB is huge. Not sure why you would ever need that. And clearly was an overestimation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really weird. Not sure I understand the reason for why it was done like this. The first thing that is done is allocating 1MB of random data on the stack. https://github.com/fortran-lang/stdlib/blob/master/test%2Fhashmaps%2Ftest_open_maps.f90#L32

While the comment was saying 48MB that was for values of rand_power above 18. rand_size is 2**rand_power. rand_object is an integer array of rand_size elements. So this immediately overflows the stack on windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the test if indeed, as an unit test does it really require such a large array? moving the array to the heap using an allocatable seems like a good solution to avoid the stack overflow, so this specific hack to increase the stack size would be not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I would be in favor of attempting to replace all arrays in the tests with allocatables and remove the stack flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm... if I revert the stack size flag I have several tests failing:

The following tests FAILED:
          1 - test-drive/all-tests (Exit code 0xc0000374
)
         12 - chaining_maps (SEGFAULT)
         13 - open_maps (SEGFAULT)
         14 - maps (SEGFAULT)
         15 - intrinsics (Failed)
         30 - linalg_pseudoinverse (Failed)
         43 - sorting (Exit code 0xc0000374
)
         47 - mean (Failed)
         64 - string_to_number (Failed)
         66 - filesystem (Failed)
         71 - simps (Failed)

It seems like many of them are using stack variable declaration instead of heap variables.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. That is more fails than I had.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's on my personal laptop, it has a smaller memory compared to a proper workstation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am testing on 32gb of memory. Also stack space is independent of main memory size.

@jalvesz jalvesz requested a review from jvdp1 May 1, 2025 20:05
@arjenmarkus
Copy link
Member

arjenmarkus commented May 3, 2025 via email

@R-Goc
Copy link

R-Goc commented May 3, 2025

Just making it an allocatable should be fine. Even just holding a smaller array and refilling it should work. Holding a big array of random values is a rather weird thing to do.

@R-Goc
Copy link

R-Goc commented May 3, 2025

Tried with the open maps test to make it allocatable but it's still overflowing the stack.

@R-Goc
Copy link

R-Goc commented May 6, 2025

I believe that this could be merged first with the increased stack size, and then the tests could be investigated more deeply. Increasing stack size is not really bad practice. It being so tiny on windows is for legacy reasons. I don't really see a reason why it needs to be so small on 64bit architectures. This also isn't actually committed memory but reserved memory.

@perazz
Copy link
Member

perazz commented May 8, 2025

Also: it would be worthwhile to check whether the test programs are run with OpenMP turned on, as it it known that stdlib is still not 100% thread-safe

@jalvesz
Copy link
Contributor Author

jalvesz commented May 8, 2025

The following 5 tests fail with ifx25+/Qopenmp (Windows), without it all tests pass.

Failing tests with OpenMP
The following tests FAILED:
         12 - chaining_maps (Failed)
         13 - open_maps (Failed)
         14 - maps (SEGFAULT)
         22 - open (Failed)
         43 - sorting (SEGFAULT)

ctest --test-dir build/test --rerun-failed --output-on-failure
Internal ctest changing into directory: D:/github/fortran-lang/stdlib/build/test
Test project D:/github/fortran-lang/stdlib/build/test
    Start 12: chaining_maps
1/5 Test #12: chaining_maps ....................***Failed    0.10 sec
KEY not found in map KEY_TEST.

    Start 13: open_maps
2/5 Test #13: open_maps ........................***Failed    0.09 sec
KEY not found in map KEY_TEST.

    Start 14: maps
3/5 Test #14: maps .............................***Exception: SegFault  0.05 sec
# Testing: stdlib-open-maps
  Starting open-maps-seeded_nmhash32x_hasher-256-byte-words ... (9/12)
  Starting open-maps-seeded_water_hasher-256-byte-words ... (11/12)
  Starting open-maps-fnv_1a_hasher-256-byte-words ... (5/12)
  Starting open-maps-seeded_nmhash32_hasher-16-byte-words ... (6/12)
  Starting open-maps-fnv_1a_hasher-16-byte-words ... (4/12)
  Starting open-maps-seeded_nmhash32_hasher-256-byte-words ... (7/12)
  Starting open-maps-fnv_1_hasher-16-byte-words ... (1/12)
  Starting open-maps-seeded_water_hasher-16-byte-words ... (10/12)
  Starting open-maps-fnv_1_hasher-16-byte-words ... (2/12)
  Starting open-maps-fnv_1_hasher-256-byte-words ... (3/12)
  Starting open-maps-seeded_nmhash32x_hasher-16-byte-words ... (8/12)
  Starting open-maps-removal-spec ... (12/12)
       ... open-maps-removal-spec [PASSED]

    Start 22: open
4/5 Test #22: open .............................***Failed    0.07 sec
# Testing: open
  Starting io_read_write_text ... (1/3)
  Starting io_open_error_flag ... (3/3)
  Starting io_read_write_stream ... (2/3)
forrtl: Le processus ne peut pas acc der au fichier car ce fichier est utilis  par un autre processus.
       ... io_read_write_text [PASSED]
       ... io_read_write_stream [PASSED]
forrtl: severe (30): open failure, unit -131, file D:\github\fortran-lang\stdlib\build\test\io\io_open.stream
Image              PC                Routine            Line        Source
test_open.exe      00007FF679EC8591  Unknown               Unknown  Unknown
test_open.exe      00007FF679E8E2FF  Unknown               Unknown  Unknown
test_open.exe      00007FF679E728BF  Unknown               Unknown  Unknown
test_open.exe      00007FF679E9FD71  Unknown               Unknown  Unknown
test_open.exe      00007FF679EA82F3  Unknown               Unknown  Unknown
libiomp5md.dll     00007FFBCFD149B3  Unknown               Unknown  Unknown
libiomp5md.dll     00007FFBCFC5C384  Unknown               Unknown  Unknown
libiomp5md.dll     00007FFBCFC5B7D5  Unknown               Unknown  Unknown
libiomp5md.dll     00007FFBCFCBAC51  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFC503E259D  Unknown               Unknown  Unknown
ntdll.dll          00007FFC5150AF38  Unknown               Unknown  Unknown

    Start 43: sorting
5/5 Test #43: sorting ..........................***Exception: SegFault  0.05 sec
# Testing: sorting
  Starting bitset_large_sorts ... (10/22)
  Starting char_ord_sorts ... (1/22)
  Starting int_sorts ... (7/22)
  Starting bitset_large_ord_sorts ... (3/22)
  Starting bitset_64_sorts ... (11/22)
  Starting int_sort_indexes_default ... (12/22)
  Starting int_radix_sorts ... (5/22)
  Starting string_sorts ... (9/22)
  Starting bitset_large_sort_indexes_default ... (15/22)
  Starting char_sort_indexes_default ... (13/22)
  Starting string_ord_sorts ... (2/22)
  Starting real_radix_sorts ... (6/22)
  Starting char_sorts ... (8/22)
  Starting string_sort_indexes_default ... (14/22)
  Starting bitset_64_ord_sorts ... (4/22)
  Starting bitset_64_sort_indexes_default ... (16/22)
 reverse + work ORD_SORT did not sort Bitset Decrease.
 SORT did not sort Bitset Decrease.
 i =                    512
 i =                      1
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000111000000000 0000000000000000000000000000000000000000000000000000110111111111
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000111111111111 0000000000000000000000000000000000000000000000000000111111111110
  Starting int_sort_indexes_low ... (17/22)
  Starting string_sort_indexes_low ... (19/22)
  Starting bitset_large_sort_indexes_low ... (20/22)
  Starting bitset_64_sort_indexes_low ... (21/22)
  Starting char_sort_indexes_low ... (18/22)
  Starting int_ord_sorts ... (22/22)
 reverse ORD_SORT did not sort Bitset Decrease.
 i =                   1280
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000000100000000 0000000000000000000000000000000000000000000000000000000011111111
 reverse SORT did not sort Bitset Decrease.
       ... bitset_64_ord_sorts [FAILED]
  Message: Condition not fullfilled
 SORT_INDEX did not sort Bitset Decrease.
 i =                   1276
 i =                   1277
bitsetl_dummy(i-1:i) =  0000000000000000000000000000000000000000000000000000101100000011

@R-Goc
Copy link

R-Goc commented May 9, 2025

Indeed doesn't appear thread safe at all. I also ran tests without omp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build error with ifx on windows Failed tests when compiling with openmp
4 participants