-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changes to allow USE_MGBF be able to be off #772
base: develop
Are you sure you want to change the base?
changes to allow USE_MGBF be able to be off #772
Conversation
61c3a99
to
7ab15cb
Compare
Install
Without this change, MGBF source code was still complied and
The changes in this draft PR along with defaulting We do not have a MGBF ctest so I can not confirm that setting |
@TingLei-daprediction , based on my tests I think this PR is ready for review. May we mark this PR as ready for review"? |
@RussTreadon-NOAA Thanks a lot! And sorry I hadn't reported in time my test results for GSI with the mgbf option turned off. |
OK. @TingLei-daprediction. I will mark this PR as ready for review and add my review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The top-level CMakeLists.txt
needs to change the BUILD_MGBF
default from ON
to OFF
in order to achieve the stated purpose of this PR.
@TingLei-daprediction , we need two peer reviewers for this PR. Who do you recommend? |
@RussTreadon-NOAA Thanks a lot (sorry for not replying sooner). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look reasonable and clean to me.
…control following mgbf option
WCOSS ctests
Examine the failures and find that each is due to non-reproducibile analysis results between the updat ( For
contrl
For
contrl
For
contrl
For
contrl
I see that I reran ctests on Hera. All ctests pass using |
@RussTreadon-NOAA Thanks. I will take a look to see what the problem is for failed ctest on dogwood while I will update this PR to the current develop branch. It is intersting, since the recent two PRs for (which are missing in this PR) don't cause any differences in ctests , why this PR lacking those PR would cause different results in those failed ctests. I will see if I missed anything. |
Agreed. The two PRs not in |
Unfortunately updating
I'll stop here and let the MGBF team investigate. |
Hercules and Orion ctests Hercules
Orion
Hercules and Orion Passed ctests results are the same as Hera. It's only on WCOSS2 (Dogwood) that several tests fail due to non-reproducible results. Several modules on WCOSS2 are built using hpc-stack, not spack-stack. Also the WCOSS2 build uses intel 19.1.3.304 whereas the Hera, Hercules, and Orion builds use intel 2021.x.0. |
@DavidHuber-NOAA , do you know if we can build |
@RussTreadon-NOAA The ctests for this PR is running on cactus and I will keep posting of my findings . |
@RussTreadon-NOAA The Spack-Stack configuration file for Acorn shows installations using Intel 2022 and Intel 19. However, Acorn is down right now, so I cannot verify that such installations are available ATTM. For Cactus and Dogwood, there are two installations of Intel 2022, one for 'classic' and one for 'One API' compilers. However, these are not approved for operations. Additionally, the libraries that are available upon loading the intel-classic/2022.2.0.262 module (e.g. HDF5) appear to have been compiled with Intel 19. Thus, I would advise against using Intel 2022 on Cactus or Dogwood. |
Thank you @DavidHuber-NOAA for the update. It's unfortunate that building with intel/2020+ is a no go on Cactus and Dogwood. I hope more recent intel compilers are soon approved for these machines. I didn't realize Acorn was down. This explains why I can't log into Acorn this morning. |
Thank you, @TingLei-daprediction . I ran ctests on Cactus following the Saturday production swtich. Results are the same as Dogwood.
The global_4denvar, hafs_4denvar_glbens, and hafs_3denvar_hybens failures are due to non-reproducible results between the updat and contrl
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These modifications seem good.
I noticed that there is an issue with the ctest run on wcoss2. Hope it could be fixed soon.
@GangZhao-NOAA , this PR can not be approved until the reason(s) for non-reproducible results on WCOSS2 is/are explained and resolved. This is a serious problem since operational realizations of the GSI run on WCOSS2. |
Using GSI built with debug mode, |
@TingLei-daprediction , good suggestion about lowering the optimization levels. I recompiled
The rtma falure is due to time threshold check
This is not a fatal fail. The updat and contrl wall times are comparable
|
@RussTreadon-NOAA Thanks for this update. It appeears that some "uncertain behavior " in the code, according to the fortran standard caused the differences observed when higher optimization levels are used. I will focus on this possibility and provide any updates as soon as possible. |
Build
All tests pass as is the case on Hera, Hercules, and Orion. This isn't too surprising since these machines compile GSI with intel/2020+ compilers. It should be noted that the intel-classic/2022 test did not recompile all the supporting modules. These remain intel/19 builds. The only change was to the intel compiler in
|
@RussTreadon-NOAA Thank you for keeping us updated. |
Additional tests with intel/19 and intel/2022 compilers on Cactus find that it is not necessary to recompile Leave With this set up the ctests yield
I do not know if NCO accepts packages compiled with |
@RussTreadon-NOAA Thanks a lot! Your findings indicate there are different treatments with the higher optimization to the "active codes after those "inactive" codes excluded by those preprocessing directives introduced in this PR. I will dig more on this as soon as possible. |
@GangZhao-NOAA : The issue with CADS is not complier related. Source code changes are needed to resolve the CADS failure in debug mode. GSI issue #775 is tracking this problem. Compiling |
@RussTreadon-NOAA an update confusing to me:(. My recent GSI (also for the control) compiled with the release mode and using intel/19.1.3.304 still gave the results of ctests passing:
I didn't find any changes for GSI itself and wondering if there are any changes in last couple of days in the system modules . |
@TingLei-NOAA , where are your GSI clone and ctest run directories on Cactus? |
@RussTreadon-NOAA The tmpdir for ctests is (cactus ) /lfs/h2/emc/ptmp/ting.lei/GSI_optVsopt. |
@RussTreadon-NOAA I just realized that there was a difference on the above GSI building process. For GSI_optVsopt/, I copied it from a previous GSI built with debug mode and , after deleting the build dir , I did a building with optimization. This is supposed to give a clean re-building. But, there are possibilities that is not the case. Hence, I am doing a GSI building from cloning and will keep you posted. |
Thank you @TingLei-NOAA for sharing that there is a mistake in your test. Unfortunately, this information came too late. I spent a lot of time trying to reproduce your result. Despite various tests I could not reproduce your Passed results. Here is a sample of the tests I ran
I repeated the above above test and used I can not see how your and my compilation of the same code in Your most recent comment points to potential problems in your build. FYI, your branch |
@RussTreadon-NOAA So sorry for not realizing the above factor earlier and really appreciate your efforts as always ! |
@@ -182,7 +184,9 @@ module hybrid_ensemble_isotropic | |||
integer(r_kind) :: nval_loc_en | |||
|
|||
! For MGBF | |||
#ifdef USE_MGBF_def |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USE_MGBF_def
is the wrong variable. CMakeLists.txt
sets USE_MGBF
. USE_MGBF_def
needs to be changed to USE_MGBF
The same comment applies to all occurrences of USE_MGBF_def
in this source code file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use different names in cmake files and the fortran source codes
in src/gsi/CMakeLists.txt, there is :
if(USE_MGBF)
list(APPEND GSI_Fortran_defs USE_MGBF_def)
endif()
That makes USE_MGBF_def to be used as the preprocessing option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set USE_MGBF
to ON
in CMakeLists.txt
. This broke the build
[ 80%] Building Fortran object src/gsi/CMakeFiles/gsi_fortran_obj.dir/hybrid_ensemble_isotropic.F90.o
/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/test/src/gsi/hybrid_ensemble_isotropic.F90(188): error #6457: This derived type name has not been declared. [MG_INTSTATE_TYPE]
type (mg_intstate_type), allocatable, dimension(:) :: obj_mgbf
--------^
/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/test/src/gsi/hybrid_ensemble_isotropic.F90(4184): error #6158: The structure-name is invalid or is missing. [OBJ_MGBF]
real(r_kind) ,intent(inout) :: g(grd_loc%nsig,obj_mgbf(ig)%nm,obj_mgbf(ig)%mm)
--------------------------------------------------^
/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/test/src/gsi/hybrid_ensemble_isotropic.F90(4184): error #6158: The structure-name is invalid or is missing. [OBJ_MGBF]
real(r_kind) ,intent(inout) :: g(grd_loc%nsig,obj_mgbf(ig)%nm,obj_mgbf(ig)%mm)
------------------------------------------------------------------^
/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/test/src/gsi/hybrid_ensemble_isotropic.F90(1769): error #6404: This name does not have a type, and must have an explicit type. [PRINT_CPU]
if(l_mgbf_loc) call print_mg_timers("mgbf_timing_cpu.csv", print_cpu, mype)
------------------------------------------------------------------^
/lfs/h2/emc/da/noscrub/russ.treadon/git/gsi/test/src/gsi/hybrid_ensemble_isotropic.F90(3749): error #6404: This name does not have a type, and must have an explicit type. [OBJ_MGBF]
allocate(work_mgbf(obj_mgbf(1)%km_a_all,obj_mgbf(1)%nm,obj_mgbf(1)%mm))
------------------------^
Looks like the error is that the first MGBF ifdef
in hybrid_ensemble_isotropic.F90
references USE_MGBF
! For MGBF
#ifdef USE_MGBF
use mg_intstate
use mg_timers
#endif /* End of USE_MGBF block */
while all the other ifdefs reference USE_MGBF_def
. We need to pick one.
My preference is to use USE_MGBF
consistently through CMakeLists.txt
and the source code. That said, I am not a cmake expert. Maybe the the two varaible approach of USE_MGBF
& USE_MGBF_def
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Russ, Thanks for exploring on them. I think use of one variable or two variable is just a stylish problem. In this case, up to now, there are no actual differences. Using two variables ( as for USE_GSDCLOUD vs RR_CLOUDANALYSIS) just gives a reminder on how those directives in the source codes are being recognized by the compiler.
If it is ok with you, I still prefer to following the use of such a "two variable" approach in the previous GSI setup as for the above GSDCLOUD management.
If you think the "one variable" approach is better enough that we don't need use the same "two variable" approach in GSI for similar situations , I can change it later .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference remains to use one variable, USE_MGBF
, all the way through. This is more readable than USE_MGBF_def
and less confusing. You are the developer. It's your PR. Do as you wish.
As a test make the following changes in a working copy of top level
Build |
@TingLei-NOAA , based on discussions with EIB we can not implement operational code using an intel-classic build. All supporting libraries and modules must be built with the same compiler. We should look ahead to intel OneAPI. A future implementation of spack-stack on WCOSS might use intel OneAPI. The GSI build could then be updated to intel OneAPI. Doing so might resolve the non-reproducible results currently found with this PR when compiled on WCOSS using intel/19. One path forward for this PR is to default |
@RussTreadon-NOAA Thanks a lot for giving a path forward . That is great! |
The issue with "floating divide by zero" in CRTM_Planck_Functions.f90 using a debug build of GSI has been seen and reported before. I don't recall who was working on the issue, the following modification to src/gsi/read_iasi.f90 resolves the failure. 433 ! Allocate arrays to hold data |
@TingLei-NOAA , what is the status of this PR? |
@RussTreadon-NOAA , On cactus, I had encountered the reproducibility issue for "optimized" GSI even with the USE_MBGF on. Also considered the previously reported success of the "optimized" GSI with USE_MGBF off (both of them were not started from a freshly cloned branch while a "clean" building was done), I 'd prefer to put this PR on "draft" status and will have a more thorough investigation into it in the future. |
@TingLei-NOAA , your branch Please update keep your branch in sync with the head of the authoritative |
@RussTreadon-NOAA Thanks for the reminder, my local branch on cacutus had been updated as below :
I would sync my branch on github soon. |
@TingLei-daprediction , what is the status of this PR? |
Please see my update I just added for its corresponding issue #765. Thanks a lot! |
Thank you @TingLei-NOAA for updating the originating issue. |
Fixes #765
Added changes in GSI codes to allow the mgbf lib be able to set off.
The changes are mainly using pre-processing directives to wrapping mgbf codes in hybrid_ensemble_isotropic.F90.
As discussed with @ManuelPondeca-NOAA and @MiodragRancic-NOAA, the current treatment is easy and straightforward, but the added directives in hybrid_ensemble_isotropic.F90 do not look as clean as expected.
This draft PR is for GSI developers to review and discuss if this is the efficient way to resolve the issue 765.