Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure with ifx #167

Open
aminiussi opened this issue Nov 30, 2023 · 6 comments
Open

Test failure with ifx #167

aminiussi opened this issue Nov 30, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@aminiussi
Copy link

aminiussi commented Nov 30, 2023

Hi,

Is ifx (intel nex generation fortran compiler that is replacing ifort) supported, I'm getting the following failures with ifx 2023.1.0:

The following tests FAILED:
	  6 - test_maxpool2d_layer (Failed)
	 12 - test_io_hdf5 (Failed)
	 14 - test_dense_network_from_keras (Failed)
	 17 - test_optimizers (Failed)
Errors while running CTest

On release build, these are failing on memory error.

In debug more, only test_optimizers is failing.

All this is on master

Thanks

@milancurcic
Copy link
Member

Thanks for reporting. I haven't tried ifx in a while, and definitely not a recent version. I'll try it and let you know what I find.

@milancurcic milancurcic added the bug Something isn't working label Dec 1, 2023
@milancurcic
Copy link
Member

Hi @aminiussi, I can't seem to reproduce this. Here's what I have:

$ ifx --version
ifx (IFX) 2023.2.0 20230721
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

HDF5 is 1.12.2 built with ifort-2021.6.

All tests pass on the latest main.

Similarly, all tests pass with ifort-2021.10.0 (that's the latest version released before deprecation in favor of ifx.

@aminiussi
Copy link
Author

Hi @milancurcic,

The test in my build fails with "unmapped address" with the following stack trace:

6:  0 0x000000000004cb95 ucs_debug_print_backtrace()  ???:0
6:  1 0x0000000000415d17 nf_maxpool2d_layer_mp_backward_()  /scratch/alainm/view/neural-fortran/src/nf/nf_maxpool2d_layer_submodule.f90:107
6:  2 0x00000000004102b2 nf_layer_mp_backward_3d_()  /scratch/alainm/view/neural-fortran/src/nf/nf_layer_submodule.f90:0
6:  3 0x000000000040d2d5 MAIN__()  /scratch/alainm/view/neural-fortran/test/test_maxpool2d_layer.f90:77
14:37:01 [alainm@castor bld]# emacs /scratch/alainm/view/neural-fortran/test/test_maxpool2d_layer.f90

The element of the backtrace is weird: /scratch/alainm/view/neural-fortran/src/nf/nf_layer_submodule.f90:0 as there is no code there.

We are using hdf5 1.14.1, and the underlying gfortran is 12.2.0. Appart from that, our ifx is slightly older...

@aminiussi
Copy link
Author

$ ifx --version
ifx (IFX) 2023.2.0 20230721
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Is that a parallel build and, if yes, which MPI is used ?

Thanks

@aminiussi
Copy link
Author

I did a debug -check all build. The test is failing with:

forrtl: severe (408): fort: (3): Subscript #3 of the array MAXLOC_X has value 0 which is less than the lower bound of 1

In coarray image 4
Image              PC                Routine            Line        Source
test_maxpool2d_la  000000000042BD1A  backward                  106  nf_maxpool2d_layer_submodule.f90
test_maxpool2d_la  0000000000417470  backward_3d                87  nf_layer_submodule.f90
test_maxpool2d_la  000000000040E789  test_maxpool2d_la          77  test_maxpool2d_layer.f90
test_maxpool2d_la  000000000040B39D  Unknown               Unknown  Unknown
libc-2.17.so       00007FFFF3C84555  __libc_start_main     Unknown  Unknown
test_maxpool2d_la  000000000040B2CB  Unknown               Unknown  Unknown

@milancurcic
Copy link
Member

Thank you, @aminiussi, this is very helpful and may be related to #145. It's possible that this is a bug that other compilers (and non-debug build modes) failed to catch but are producing incorrect results. I'll look deeper into this.

Is that a parallel build and, if yes, which MPI is used ?

I haven't built in parallel with the Intel compilers. It's Intel MPI that comes bundled with the OneAPI suite, but I don't think I configured it properly on my computer and haven't had time to dedicate to a parallel Intel build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants