Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C48mx500_3DVarAOWCDA and C48mx500_hybAOWCDA gdas_stage failures on ORION #3136

Open
RussTreadon-NOAA opened this issue Dec 3, 2024 · 4 comments
Labels
bug Something isn't working triage Issues that are triage

Comments

@RussTreadon-NOAA
Copy link
Contributor

What is wrong?

The C48mx500_3DVarAOWCDA and C48mx500_hybAOWCDA gdas_stage_ic jobs for 20210324 12Z fail on Orion when attempting to copy

/work/noaa/global/glopara/data/ICSDIR/C48mx500/20241120/gdas.20210324/12/analysis/ice/20210324.090000.cice_model_anl.res.nc

to the run directory. This file does not exist on Orion.

Directory /work/noaa/global/glopara/data/ICSDIR/C48mx500/20241120/gdas.20210324/12/analysis/ice/ contains file 20210324.090000.cice_model.res.nc. There is no _anl in the filename.

What should have happened?

The stage job should successfully run to completion.

What machines are impacted?

Orion

What global-workflow hash are you using?

DavidNew-NOAA:feature/gw-ci at 554a20a

Steps to reproduce

  1. install g-w PR Turn C96C48_ufs_hybatmDA and C48mx500_3DVarAOWCDA into a regression test #3120 on Orion
  2. module use $HOMEgfs/sorc/gdas.cd/modulefiles
  3. module load GDAS/orion.intel
  4. cd $HOMEgfs/sorc/gdas.cd/build
  5. ctest -R test_gdasapp_C48mx500_3DVarAOWCDA

Job test_gdasapp_C48mx500_3DVarAOWCDA_gdas_stage_ic_202103241200 will fail

        Start 2025: test_gdasapp_C48mx500_3DVarAOWCDA_gdas_stage_ic_202103241200
 48/133 Test #2025: test_gdasapp_C48mx500_3DVarAOWCDA_gdas_stage_ic_202103241200 .............***Failed  115.64 sec

Additional information

Log file for the C48mx500_3DVarAOWCDA failure is

/work2/noaa/da/rtreadon/git/global-workflow/pr3120/sorc/gdas.cd/build/gdas/test/gw-ci/C48mx500_3DVarAOWCDA/COMROOT/C48mx500_3DVarAOWCDA/logs/2021032412/gdas_stage_ic.log

Log file for the C48mx500_hybAOWCDA failure is

/work2/noaa/da/rtreadon/git/global-workflow/pr3120/sorc/gdas.cd/build/gdas/test/gw-ci/C48mx500_hybAOWCDA/COMROOT/C48mx500_hybAOWCDA/logs/2021032412/gdas_stage_ic.log

Do you have a proposed solution?

Either the yaml for the stage_ic job is creating the wrong filename or the file on disk is missing _anl in the filename.

@RussTreadon-NOAA RussTreadon-NOAA added bug Something isn't working triage Issues that are triage labels Dec 3, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

@guillaumevernieres and @AndrewEichmann-NOAA : not sure if the problem is

  1. $HOMEgfs/parm/stage/ice.yaml.j2 which has
 - ["{{ ICSDIR }}/{{ COMOUT_ICE_ANALYSIS_MEM | relpath(ROTDIR) }}/{{ m_prefix }}.cice_model_anl.res.nc", "{{ COMOUT_ICE_ANALYSIS_MEM }}"]

or

2 in the $ICSCIR which has

(gdasapp) orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500/20241120/gdas.20210324/12/analysis/ice$ ls -lL
total 3136
-rw-r--r-- 1 wkolczyn global 3207988 Nov 17  2023 20210324.090000.cice_model.res.nc

@RussTreadon-NOAA
Copy link
Contributor Author

A check of WCOSS2 (Dogwood) finds that 20210324.090000.cice_model_anl.res.nc exists in /lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500/gdas.20210324/12/analysis/ice

russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500/gdas.20210324/12/analysis/ice> ls -lL
total 3136
-rw-r--r-- 1 emc.global global 3207988 Nov 17  2023 20210324.090000.cice_model_anl.res.nc
russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500/gdas.20210324/12/analysis/ice> 

This suggests that file is misnamed on Orion.

Attention @WalterKolczynski-NOAA

@guillaumevernieres
Copy link
Contributor

You can't run these test on orion with develop just yet @RussTreadon-NOAA .

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @guillaumevernieres for letting me know. There is still a problem with the ICS.

@WalterKolczynski-NOAA , I looked more closely on Orion and Dogwood. The name of the 20210324.090000.cice_model file changes depending on which directory one looks in.

Orion

orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500$ ls
20240610  20241120  enkfgdas.20210323  gdas.20210323  gdas.20210324  gefs.20210323
orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500$ ls -lL 20240610/gdas.20210324/12/analysis/ice/
total 3136
-rw-r--r-- 1 wkolczyn global 3207988 Nov 17  2023 20210324.090000.cice_model_anl.res.nc
orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500$
orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500$ ls -lL 20241120/gdas.20210324/12/analysis/ice/
total 3136
-rw-r--r-- 1 wkolczyn global 3207988 Nov 17  2023 20210324.090000.cice_model.res.nc
orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500$
orion-login-3:/work/noaa/global/glopara/data/ICSDIR/C48mx500$ ls -lL gdas.20210324/12/analysis/ice/
total 3136
-rw-r--r-- 1 wkolczyn global 3207988 Nov 17  2023 20210324.090000.cice_model_anl.res.nc

Dogwood

russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500> ls 
20240610  20241120  enkfgdas.20210323  gdas.20210323  gdas.20210324  gefs.20210323
russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500> ls -lL 20240610/gdas.20210324/12/analysis/ice/
total 3136
-rw-r--r-- 1 emc.global global 3207988 Nov 17  2023 20210324.090000.cice_model_anl.res.nc
russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500> 
russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500> ls -lL 20241120/gdas.20210324/12/analysis/ice/
total 3136
-rw-r--r-- 1 emc.global global 3207988 Nov 17  2023 20210324.090000.cice_model.res.nc
russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500> 
russ.treadon@dlogin07:/lfs/h2/emc/global/noscrub/emc.global/data/ICSDIR/C48mx500> ls -lL gdas.20210324/12/analysis/ice/
total 3136
-rw-r--r-- 1 emc.global global 3207988 Nov 17  2023 20210324.090000.cice_model_anl.res.nc

In two of the three directories the filename is 20210324.090000.cice_model_anl.res.nc. This is consistent with $HOMEgfs/parm/stage/ice.yaml.j2. The 20241120 directory does not include _anl in the filename. This is not consistent with ice.yaml.j2.

g-w versions/ic.ver contains

ic_versions['C48mx500']=20241120

This explains why g-w CI is pointing at the 20241120 directory. If this is the directory we want to use, we either need to rename the 20210324.090000.cice_model file or modify ice.yaml,j2. Since two of three directories have _anl in the filename, I think we need to add _anl to the filename in the 20241120 directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issues that are triage
Projects
None yet
Development

No branches or pull requests

2 participants