Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_extrn_lbcs task succeeds even if all files are not retrieved successfully #596

Closed
mkavulich opened this issue Feb 8, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@mkavulich
Copy link
Collaborator

Expected behavior

If some files are not retrieved correctly, the get_extrn_lbcs task should report a failure.

Current behavior

Instead, if some but not all files are retrieved successfully, this error message will report failures in the log file, but will still report "SUCCESS" to rocoto.

Machines affected

Hera, potentially others

Steps To Reproduce

  1. Create an experiment where some but not all files are available. I created this with the following config.sh file, attempting to run a forecast of 108 hours from netcdf input (which are only available on HPSS out to 90 hours):
metadata:
  description: |-
    This test checks the capability of the workflow to retrieve from NOAA
    HPSS netcdf-formatted output files generated by the FV3GFS external
    model (FCST_LEN_HRS>=100).
user:
  RUN_ENVIR: community
workflow:
  CCPP_PHYS_SUITE: FV3_GFS_v16
  PREDEF_GRID_NAME: RRFS_CONUS_25km
  DATE_FIRST_CYCL: '2022060112'
  DATE_LAST_CYCL: '2022060112'
  FCST_LEN_HRS: 108
  PREEXISTING_DIR_METHOD: rename
task_get_extrn_ics:
  EXTRN_MDL_NAME_ICS: FV3GFS
  FV3GFS_FILE_FMT_ICS: netcdf
task_get_extrn_lbcs:
  EXTRN_MDL_NAME_LBCS: FV3GFS
  LBC_SPEC_INTVL_HRS: 12
  FV3GFS_FILE_FMT_LBCS: netcdf
  1. Run the workflow via rocotorun, and observe that the get_extrn_lbcs is reported as "SUCCEEDED" by rocotostat, and it is only the later task make_lbcs that correctly fails when it can not find all the needed input data:
(regional_workflow) /scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h$ rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db -v 10
       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202206011200               make_grid                    41813549           SUCCEEDED                   0         1          10.0
202206011200               make_orog                    41813554           SUCCEEDED                   0         1          34.0
202206011200          make_sfc_climo                    41813557           SUCCEEDED                   0         1          40.0
202206011200           get_extrn_ics                    41813550           SUCCEEDED                   0         1          72.0
202206011200          get_extrn_lbcs                    41813551           SUCCEEDED                   0         1         196.0
202206011200                make_ics                    41813573           SUCCEEDED                   0         1          89.0
202206011200               make_lbcs                    41813618                DEAD                 256         1         452.0
202206011200                run_fcst                           -                   -                   -         -             -
  1. Open the log file for get_extrn_lbcs, observe that there were actually many failures for this "successful" task:
...
...
INFO: Moving /scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h/2022060112/FV3GFS/for_LBCS/./gfs.20220601/12/atmos/gfs.t12z.atmf084.nc to /scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h/2022060112/FV3GFS/for_LBCS/gfs.t12z.atmf084.nc

INFO: File does not exist: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h/2022060112/FV3GFS/for_LBCS/./gfs.20220601/12/atmos/gfs.t12z.atmf096.nc

INFO: File does not exist: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h/2022060112/FV3GFS/for_LBCS/./gfs.20220601/12/atmos/gfs.t12z.atmf108.nc

INFO: Removing ./gfs.20220601/12/atmos

INFO: Writing a summary file to /scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h/2022060112/FV3GFS/for_LBCS/extrn_mdl_var_defns.sh

INFO: Contents:
DATA_SRC=hpss
EXTRN_MDL_CDATE=2022060112
EXTRN_MDL_STAGING_DIR=/scratch2/BMC/fv3lam/kavulich/UFS/workdir/python_WE2E_script_round_2/expt_dirs/get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_108h/2022060112/FV3GFS/for_LBCS
EXTRN_MDL_FNS=( gfs.t12z.atmf012.nc gfs.t12z.atmf024.nc gfs.t12z.atmf036.nc gfs.t12z.atmf048.nc gfs.t12z.atmf060.nc gfs.t12z.atmf072.nc gfs.t12z.atmf084.nc gfs.t12z.atmf096.nc gfs.t12z.atmf108.nc )
EXTRN_MDL_FHRS=( 12 24 36 48 60 72 84 96 108 )


End exregional_get_extrn_mdl_files.sh at Wed Feb  8 18:37:05 UTC 2023 with error code 0 (time elapsed: 00:03:07)
  1. Also, observe that the LBCs directory does not have all the files required:
$ ls 2022060112/FV3GFS/for_LBCS/
extrn_mdl_var_defns.sh  gfs.t12z.atmf024.nc  gfs.t12z.atmf048.nc  gfs.t12z.atmf072.nc
gfs.t12z.atmf012.nc     gfs.t12z.atmf036.nc  gfs.t12z.atmf060.nc  gfs.t12z.atmf084.nc
@MichaelLueken
Copy link
Collaborator

Issue #1165 supersedes this issue. Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants