Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased gsi.x and enkf.x executable wall times following Orion Rocky 9 upgrade #771

Closed
RussTreadon-NOAA opened this issue Jul 8, 2024 · 14 comments
Assignees

Comments

@RussTreadon-NOAA
Copy link
Contributor

This issue is the GSI counterpart to spack-stack issue #1166. gsi.x and enkf.x wall times significantly increased following the Orion Rocky 9 upgrade.

develop at 529bb79 was installed on Orion and Hercules. The standard suite of 6 ctests were run on both machines. Below are the gsi.x and enkf.x wall times for individual jobs in the ctests.

Orion wall times

tmpreg_global_4denvar/global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 737.034865
tmpreg_global_4denvar/global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 734.665725
tmpreg_global_4denvar/global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 968.070908
tmpreg_global_4denvar/global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 960.139172
tmpreg_global_enkf/global_enkf_hiproc_contrl/stdout:The total amount of wall time                        = 145.320418
tmpreg_global_enkf/global_enkf_hiproc_updat/stdout:The total amount of wall time                        = 145.032614
tmpreg_global_enkf/global_enkf_loproc_contrl/stdout:The total amount of wall time                        = 172.618057
tmpreg_global_enkf/global_enkf_loproc_updat/stdout:The total amount of wall time                        = 172.388382
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_hiproc_contrl/stdout:The total amount of wall time                        = 622.283527
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_hiproc_updat/stdout:The total amount of wall time                        = 615.704402
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_loproc_contrl/stdout:The total amount of wall time                        = 678.747207
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_loproc_updat/stdout:The total amount of wall time                        = 662.742551
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_hiproc_contrl/stdout:The total amount of wall time                        = 680.214517
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_hiproc_updat/stdout:The total amount of wall time                        = 673.123383
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_loproc_contrl/stdout:The total amount of wall time                        = 751.616643
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_loproc_updat/stdout:The total amount of wall time                        = 720.019887
tmpreg_rtma/rtma_hiproc_contrl/stdout:The total amount of wall time                        = 355.472720
tmpreg_rtma/rtma_hiproc_updat/stdout:The total amount of wall time                        = 355.257576
tmpreg_rtma/rtma_loproc_contrl/stdout:The total amount of wall time                        = 362.637529
tmpreg_rtma/rtma_loproc_updat/stdout:The total amount of wall time                        = 364.954559

Hercules wall times

tmpreg_global_4denvar/global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 336.044743
tmpreg_global_4denvar/global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 316.933162
tmpreg_global_4denvar/global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 437.249590
tmpreg_global_4denvar/global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 437.135432
tmpreg_global_enkf/global_enkf_hiproc_contrl/stdout:The total amount of wall time                        = 119.617426
tmpreg_global_enkf/global_enkf_hiproc_updat/stdout:The total amount of wall time                        = 115.918090
tmpreg_global_enkf/global_enkf_loproc_contrl/stdout:The total amount of wall time                        = 145.799217
tmpreg_global_enkf/global_enkf_loproc_updat/stdout:The total amount of wall time                        = 143.453337
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_hiproc_contrl/stdout:The total amount of wall time                        = 194.744134
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_hiproc_updat/stdout:The total amount of wall time                        = 193.638881
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_loproc_contrl/stdout:The total amount of wall time                        = 250.442483
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_loproc_updat/stdout:The total amount of wall time                        = 249.476651
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_hiproc_contrl/stdout:The total amount of wall time                        = 208.177078
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_hiproc_updat/stdout:The total amount of wall time                        = 227.489386
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_loproc_contrl/stdout:The total amount of wall time                        = 267.448455
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_loproc_updat/stdout:The total amount of wall time                        = 276.878986
tmpreg_rrfs_3denvar_rdasens/rrfs_3denvar_rdasens_hiproc_contrl/stdout:The total amount of wall time                        = 69.066162
tmpreg_rrfs_3denvar_rdasens/rrfs_3denvar_rdasens_hiproc_updat/stdout:The total amount of wall time                        = 69.614745
tmpreg_rrfs_3denvar_rdasens/rrfs_3denvar_rdasens_loproc_contrl/stdout:The total amount of wall time                        = 72.481025
tmpreg_rrfs_3denvar_rdasens/rrfs_3denvar_rdasens_loproc_updat/stdout:The total amount of wall time                        = 69.579325
tmpreg_rtma/rtma_hiproc_contrl/stdout:The total amount of wall time                        = 190.842044
tmpreg_rtma/rtma_hiproc_updat/stdout:The total amount of wall time                        = 185.278206
tmpreg_rtma/rtma_loproc_contrl/stdout:The total amount of wall time                        = 192.844854
tmpreg_rtma/rtma_loproc_updat/stdout:The total amount of wall time                        = 194.275655

There is no Orion times for the rrfs_3denvar_rdasens test because this ctest hangs on Orion. See issue #766 for details.

Orion gsi.x wall times are 2 to 3 times greater than their Hercules counterpart. We expect Hercules wall times to be a bit faster than Orion but not by a factor of 2 to 3. The increase in Orion enkf.x wall times with respect to Hercules is not as significant.

This issue is opened to document

  • increased gsi.x and enfk.x wall times on Orion following the Rocky 9 upgrade.
  • steps taken to return Orion gsi.x and enkf.x wall times to pre-Rocky 9 values
@RussTreadon-NOAA
Copy link
Contributor Author

Add dclock calls to glbsoi.f90, observer.F90, and read_obs.F90 to track time per functional unit in gsi.x execution. Build code on Orion and Hercules. Run global_4denvar ctest with timing results below

Orion: total gsi.x wall time 922.819213 seconds

timer stats: glbsoi:observer_init avg   39.3423     min   39.3127     max   39.3512
timer stats: glbsoi:create_jfunc avg  0.979754E-02 min  0.654697E-02 max  0.106041E-01
timer stats: glbsoi:openbf avg  0.221839E-02 min  0.195217E-02 max  0.612307E-02
timer stats: read_obs:readvqcdatfile avg  0.125957E-02 min  0.292063E-03 max  0.210905E-02
timer stats: read_obs::set reader tasks avg   2.52104     min  0.342970E-01 max   50.3974
timer stats: read_obs:distribute reader task info avg   43.9292     min  0.225067E-03 max   50.3631
timer stats: read_obs:finalize reader task setup avg  0.352147E-02 min  0.345302E-02 max  0.360489E-02
timer stats: read_obs:getsfc avg  0.995863     min  0.995642     max  0.996050
timer stats: read_obs:3d pressure and buoy setup avg  0.594276E-01 min  0.461221E-01 max  0.602560E-01
timer stats: read_obs:finish read obs loop avg   142.585     min   66.6217     max   551.629
timer stats: read_obs:reductions & write out obs avg   380.750     min  0.197220E-02 max   485.009
timer stats: observer:read_obs avg   603.139     min   603.135     max   603.143
timer stats: glbsoi:observer_set avg   603.882     min   603.691     max   603.963
timer stats: glbsoi:exitbufr avg  0.133175E-02 min  0.412941E-03 max  0.127270E-01
timer stats: glbsoi:global setup avg  0.312307     min  0.233846     max  0.507885
timer stats: glbsoi:load ensemble avg   9.07371     min   9.04500     max   9.08724
timer stats: glbsoi:setuprhsall avg   82.1254     min   82.1244     max   82.1404
timer stats: glbsoi:pcgsoi avg   10.3174     min   10.2960     max   10.3913
timer stats: glbsoi:setuprhsall avg   76.7611     min   76.6916     max   76.7732
timer stats: glbsoi:pcgsoi avg   13.8162     min   13.7945     max   13.8903
timer stats: glbsoi:final setuprhsall avg   82.3959     min   82.3263     max   82.4088

Hercules: total gsi.x wall time 440.304501 seconds

timer stats: glbsoi:observer_init avg   68.2950     min   68.2892     max   68.2999
timer stats: glbsoi:create_jfunc avg  0.823952E-02 min  0.524497E-02 max  0.891519E-02
timer stats: glbsoi:openbf avg  0.823163E-01 min  0.787411E-01 max  0.885100E-01
timer stats: read_obs:readvqcdatfile avg  0.149384E-02 min  0.756979E-03 max  0.193310E-02
timer stats: read_obs::set reader tasks avg  0.769786     min  0.864911E-02 max   14.9756
timer stats: read_obs:distribute reader task info avg   13.0860     min  0.160933E-03 max   14.9684
timer stats: read_obs:finalize reader task setup avg  0.177818E-02 min  0.173903E-02 max  0.180197E-02
timer stats: read_obs:getsfc avg   3.80337     min   3.80326     max   3.80339
timer stats: read_obs:3d pressure and buoy setup avg  0.418469E-01 min  0.278199E-01 max  0.424819E-01
timer stats: read_obs:finish read obs loop avg   37.0401     min   18.3169     max   144.775
timer stats: read_obs:reductions & write out obs avg   100.623     min  0.730038E-03 max   126.459
timer stats: observer:read_obs avg   163.642     min   163.642     max   163.643
timer stats: glbsoi:observer_set avg   164.325     min   164.168     max   164.419
timer stats: glbsoi:exitbufr avg  0.292902E-03 min  0.246048E-03 max  0.427008E-03
timer stats: glbsoi:global setup avg  0.172493     min  0.936239E-01 max  0.347333
timer stats: glbsoi:load ensemble avg   7.96850     min   7.96327     max   7.97344
timer stats: glbsoi:setuprhsall avg   59.8326     min   59.8318     max   59.8391
timer stats: glbsoi:pcgsoi avg   7.50577     min   7.49486     max   7.54579
timer stats: glbsoi:setuprhsall avg   57.4839     min   57.4462     max   57.4907
timer stats: glbsoi:pcgsoi avg   9.67520     min   9.66364     max   9.71827
timer stats: glbsoi:final setuprhsall avg   60.2496     min   60.2093     max   60.2581

The largest difference in Orion and Hercules timings occurs when reading bufr observations in parallel. Orion took 407 seconds longer for all tasks to process bufr observations than did Hercules.

Orion

timer stats: read_obs:finish read obs loop avg   142.585     min   66.6217     max   551.629

Hercules

timer stats: read_obs:finish read obs loop avg   37.0401     min   18.3169     max   144.775

Timers can be added to read_obs to quantify the wall time per bufr observation reader.

@RussTreadon-NOAA
Copy link
Contributor Author

Add timers to read_obs.F90. Below are wall times in seconds for indicated observation reads

Orion

    21: timer stats: READ_CRIS                28.6474
    22: timer stats: READ_CRIS                28.6474
    26: timer stats: READ_CRIS                28.6474
    23: timer stats: READ_CRIS                28.6474
    29: timer stats: READ_CRIS                28.6474
    16: timer stats: READ_CRIS                28.6468
    31: timer stats: READ_CRIS                28.6474
    17: timer stats: READ_CRIS                28.6474
    24: timer stats: READ_CRIS                28.6474
    18: timer stats: READ_CRIS                28.6474
    25: timer stats: READ_CRIS                28.6474
    19: timer stats: READ_CRIS                28.6474
    27: timer stats: READ_CRIS                28.6474
    20: timer stats: READ_CRIS                28.6476
    28: timer stats: READ_CRIS                28.6474
    30: timer stats: READ_CRIS                28.6474
    63: timer stats: READ_IASI                54.4328
    62: timer stats: READ_IASI                54.4329
    55: timer stats: READ_IASI                54.4332
    61: timer stats: READ_IASI                54.4330
    59: timer stats: READ_IASI                54.4332
    60: timer stats: READ_IASI                54.4330
    49: timer stats: READ_IASI                54.4333
    50: timer stats: READ_IASI                54.4332
    51: timer stats: READ_IASI                54.4333
    52: timer stats: READ_IASI                54.4333
    53: timer stats: READ_IASI                54.4333
    54: timer stats: READ_IASI                54.4332
    56: timer stats: READ_IASI                54.4332
    57: timer stats: READ_IASI                54.4332
    58: timer stats: READ_IASI                54.4331
    48: timer stats: READ_IASI                54.4341
    15: timer stats: READ_IASI                57.7201
    14: timer stats: READ_IASI                57.7204
    13: timer stats: READ_IASI                57.7207
    12: timer stats: READ_IASI                57.7209
    10: timer stats: READ_IASI                57.7245
    11: timer stats: READ_IASI                57.7212
     8: timer stats: READ_IASI                57.7249
     9: timer stats: READ_IASI                57.7249
     1: timer stats: READ_IASI                57.7212
     2: timer stats: READ_IASI                57.7213
     3: timer stats: READ_IASI                57.7241
     4: timer stats: READ_IASI                57.7241
     5: timer stats: READ_IASI                57.7245
     6: timer stats: READ_IASI                57.7250
     7: timer stats: READ_IASI                57.7241
     0: timer stats: READ_IASI                57.7223
    23: timer stats: READ_AVHRR               40.0085
    22: timer stats: READ_AVHRR               40.0086
    19: timer stats: READ_AVHRR               40.0086
    21: timer stats: READ_AVHRR               40.0086
    18: timer stats: READ_AVHRR               40.0086
    20: timer stats: READ_AVHRR               40.0086
    17: timer stats: READ_AVHRR               40.0086
    16: timer stats: READ_AVHRR               40.0086
    31: timer stats: READ_AVHRR               45.7324
    29: timer stats: READ_AVHRR               45.7325
    26: timer stats: READ_AVHRR               45.7326
    27: timer stats: READ_AVHRR               45.7325
    30: timer stats: READ_AVHRR               45.7325
    28: timer stats: READ_AVHRR               45.7327
    25: timer stats: READ_AVHRR               45.7328
    24: timer stats: READ_AVHRR               45.7328
    47: timer stats: READ_CRIS                87.6466
    46: timer stats: READ_CRIS                87.6467
    33: timer stats: READ_CRIS                87.6470
    45: timer stats: READ_CRIS                87.6468
    34: timer stats: READ_CRIS                87.6469
    44: timer stats: READ_CRIS                87.6468
    35: timer stats: READ_CRIS                87.6469
    43: timer stats: READ_CRIS                87.6468
    42: timer stats: READ_CRIS                87.6470
    41: timer stats: READ_CRIS                87.6469
    40: timer stats: READ_CRIS                87.6470
    38: timer stats: READ_CRIS                87.6470
    39: timer stats: READ_CRIS                87.6470
    36: timer stats: READ_CRIS                87.6470
    37: timer stats: READ_CRIS                87.6472
    32: timer stats: READ_CRIS                87.6475
    95: timer stats: READ_VIIRS               119.945
    94: timer stats: READ_VIIRS               119.945
    93: timer stats: READ_VIIRS               119.946
    91: timer stats: READ_VIIRS               119.946
    90: timer stats: READ_VIIRS               119.946
    89: timer stats: READ_VIIRS               119.946
    81: timer stats: READ_VIIRS               119.946
    86: timer stats: READ_VIIRS               119.946
    82: timer stats: READ_VIIRS               119.946
    92: timer stats: READ_VIIRS               119.946
    83: timer stats: READ_VIIRS               119.946
    85: timer stats: READ_VIIRS               119.946
    88: timer stats: READ_VIIRS               119.946
    87: timer stats: READ_VIIRS               119.946
    84: timer stats: READ_VIIRS               119.946
    80: timer stats: READ_VIIRS               119.946
    93: timer stats: READ_OZONE              0.353938
    79: timer stats: READ_VIIRS               121.121
    78: timer stats: READ_VIIRS               121.122
    76: timer stats: READ_VIIRS               121.122
    75: timer stats: READ_VIIRS               121.122
    77: timer stats: READ_VIIRS               121.122
    74: timer stats: READ_VIIRS               121.122
    71: timer stats: READ_VIIRS               121.122
    73: timer stats: READ_VIIRS               121.122
    72: timer stats: READ_VIIRS               121.122
    67: timer stats: READ_VIIRS               121.122
    68: timer stats: READ_VIIRS               121.122
    66: timer stats: READ_VIIRS               121.122
    69: timer stats: READ_VIIRS               121.122
    70: timer stats: READ_VIIRS               121.122
    65: timer stats: READ_VIIRS               121.122
    64: timer stats: READ_VIIRS               121.123
    85: timer stats: READ_OZONE               1.62620
    89: timer stats: READ_OZONE               1.98980
    94: timer stats: READ_OZONE               2.11263
    81: timer stats: READ_BUFRTOVS            2.68223
    75: timer stats: READ_PREPBUFR            4.38559
    79: timer stats: READ_NSSTBUFR            5.04728
    47: timer stats: READ_AVHRR               40.6250
    46: timer stats: READ_AVHRR               40.6249
    45: timer stats: READ_AVHRR               40.6250
    44: timer stats: READ_AVHRR               40.6249
    42: timer stats: READ_AVHRR               40.6249
    43: timer stats: READ_AVHRR               40.6249
    41: timer stats: READ_AVHRR               40.6248
    40: timer stats: READ_AVHRR               40.6250
    82: timer stats: READ_BUFRTOVS            8.67870
    86: timer stats: READ_BUFRTOVS            8.98850
    65: timer stats: READ_BUFRTOVS            8.09661
    64: timer stats: READ_BUFRTOVS            8.09578
    69: timer stats: READ_BUFRTOVS            8.43694
    68: timer stats: READ_BUFRTOVS            8.43661
    67: timer stats: READ_BUFRTOVS            8.61997
    66: timer stats: READ_BUFRTOVS            8.60470
    80: timer stats: READ_GPS                 9.92920
    36: timer stats: READ_AVHRR               43.3248
    39: timer stats: READ_AVHRR               43.3251
    35: timer stats: READ_AVHRR               43.3250
    38: timer stats: READ_AVHRR               43.3252
    33: timer stats: READ_AVHRR               43.3251
    37: timer stats: READ_AVHRR               43.3253
    34: timer stats: READ_AVHRR               43.3251
    32: timer stats: READ_AVHRR               43.3242
    88: timer stats: READ_BUFRTOVS            11.3951
    74: timer stats: READ_PREPBUFR            11.5022
    77: timer stats: READ_PREPBUFR            13.0966
    72: timer stats: READ_PREPBUFR            13.3955
    15: timer stats: READ_CRIS                85.2315
    14: timer stats: READ_CRIS                85.2314
    12: timer stats: READ_CRIS                85.2309
    13: timer stats: READ_CRIS                85.2311
     6: timer stats: READ_CRIS                85.2307
     7: timer stats: READ_CRIS                85.2307
     8: timer stats: READ_CRIS                85.2307
     9: timer stats: READ_CRIS                85.2307
    10: timer stats: READ_CRIS                85.2307
    11: timer stats: READ_CRIS                85.2307
     1: timer stats: READ_CRIS                85.2307
     2: timer stats: READ_CRIS                85.2307
     3: timer stats: READ_CRIS                85.2307
     4: timer stats: READ_CRIS                85.2307
     5: timer stats: READ_CRIS                85.2307
     0: timer stats: READ_CRIS                85.2297
     1: timer stats: READ_OZONE              0.433676
    73: timer stats: READ_PREPBUFR            23.1917
    70: timer stats: READ_PREPBUFR            23.2368
    71: timer stats: READ_PREPBUFR            23.2690
     7: timer stats: READ_OZONE               1.97127
     3: timer stats: READ_BUFRTOVS            9.23574
    76: timer stats: READ_PREPBUFR            36.9617
    62: timer stats: READ_ABI                 111.597
    58: timer stats: READ_ABI                 111.597
    60: timer stats: READ_ABI                 111.598
    59: timer stats: READ_ABI                 111.597
    63: timer stats: READ_ABI                 111.598
    57: timer stats: READ_ABI                 111.597
    61: timer stats: READ_ABI                 111.598
    56: timer stats: READ_ABI                 111.597
    50: timer stats: READ_ABI                 116.369
    51: timer stats: READ_ABI                 116.369
    52: timer stats: READ_ABI                 116.369
    53: timer stats: READ_ABI                 116.369
    54: timer stats: READ_ABI                 116.369
    55: timer stats: READ_ABI                 116.369
    49: timer stats: READ_ABI                 116.369
    48: timer stats: READ_ABI                 116.366
     4: timer stats: READ_AHI                 30.5457
     2: timer stats: READ_OZONE               35.9112
     0: timer stats: READ_OZONE               35.9427
     6: timer stats: READ_OZONE               36.9671
    91: timer stats: READ_ATMS                115.133
    90: timer stats: READ_ATMS                116.143
     5: timer stats: READ_ATMS                115.214
    84: timer stats: READ_SSMIS               155.442
    83: timer stats: READ_SSMIS               172.926
    92: timer stats: READ_GMI                 318.892
    78: timer stats: READ_SATWND              435.596

Hercules

    25: timer stats: READ_CRIS                7.89926
    18: timer stats: READ_CRIS                7.89897
    24: timer stats: READ_CRIS                7.89927
    20: timer stats: READ_CRIS                7.89895
    26: timer stats: READ_CRIS                7.89927
    23: timer stats: READ_CRIS                7.89895
    27: timer stats: READ_CRIS                7.89926
    16: timer stats: READ_CRIS                7.89836
    28: timer stats: READ_CRIS                7.89924
    17: timer stats: READ_CRIS                7.89899
    29: timer stats: READ_CRIS                7.89933
    19: timer stats: READ_CRIS                7.89900
    30: timer stats: READ_CRIS                7.89930
    21: timer stats: READ_CRIS                7.89901
    31: timer stats: READ_CRIS                7.89931
    22: timer stats: READ_CRIS                7.89900
    63: timer stats: READ_IASI                14.2766
    58: timer stats: READ_IASI                14.2768
    60: timer stats: READ_IASI                14.2767
    59: timer stats: READ_IASI                14.2769
    61: timer stats: READ_IASI                14.2767
    56: timer stats: READ_IASI                14.2770
    62: timer stats: READ_IASI                14.2766
    53: timer stats: READ_IASI                14.2770
    54: timer stats: READ_IASI                14.2770
    55: timer stats: READ_IASI                14.2770
    57: timer stats: READ_IASI                14.2769
    49: timer stats: READ_IASI                14.2771
    50: timer stats: READ_IASI                14.2770
    51: timer stats: READ_IASI                14.2771
    52: timer stats: READ_IASI                14.2771
    48: timer stats: READ_IASI                14.2615
    15: timer stats: READ_IASI                15.1522
    14: timer stats: READ_IASI                15.1524
    13: timer stats: READ_IASI                15.1525
    12: timer stats: READ_IASI                15.1524
    10: timer stats: READ_IASI                15.1559
    11: timer stats: READ_IASI                15.1526
     1: timer stats: READ_IASI                15.1526
     2: timer stats: READ_IASI                15.1527
     3: timer stats: READ_IASI                15.1552
     4: timer stats: READ_IASI                15.1559
     5: timer stats: READ_IASI                15.1554
     6: timer stats: READ_IASI                15.1552
     7: timer stats: READ_IASI                15.1555
     8: timer stats: READ_IASI                15.1561
     9: timer stats: READ_IASI                15.1555
     0: timer stats: READ_IASI                15.1533
    23: timer stats: READ_AVHRR               10.2355
    19: timer stats: READ_AVHRR               10.2356
    21: timer stats: READ_AVHRR               10.2355
    22: timer stats: READ_AVHRR               10.2355
    17: timer stats: READ_AVHRR               10.2356
    18: timer stats: READ_AVHRR               10.2356
    20: timer stats: READ_AVHRR               10.2356
    16: timer stats: READ_AVHRR               10.2356
    31: timer stats: READ_AVHRR               11.8628
    26: timer stats: READ_AVHRR               11.8629
    27: timer stats: READ_AVHRR               11.8629
    28: timer stats: READ_AVHRR               11.8629
    29: timer stats: READ_AVHRR               11.8629
    30: timer stats: READ_AVHRR               11.8628
    25: timer stats: READ_AVHRR               11.8630
    24: timer stats: READ_AVHRR               11.8553
    47: timer stats: READ_CRIS                23.1123
    34: timer stats: READ_CRIS                23.1128
    44: timer stats: READ_CRIS                23.1125
    35: timer stats: READ_CRIS                23.1128
    45: timer stats: READ_CRIS                23.1125
    33: timer stats: READ_CRIS                23.1130
    46: timer stats: READ_CRIS                23.1124
    41: timer stats: READ_CRIS                23.1127
    42: timer stats: READ_CRIS                23.1126
    43: timer stats: READ_CRIS                23.1125
    36: timer stats: READ_CRIS                23.1127
    37: timer stats: READ_CRIS                23.1127
    38: timer stats: READ_CRIS                23.1128
    39: timer stats: READ_CRIS                23.1127
    40: timer stats: READ_CRIS                23.1127
    32: timer stats: READ_CRIS                23.1135
    79: timer stats: READ_VIIRS               30.1726
    76: timer stats: READ_VIIRS               30.1728
    65: timer stats: READ_VIIRS               30.1726
    77: timer stats: READ_VIIRS               30.1727
    71: timer stats: READ_VIIRS               30.1727
    78: timer stats: READ_VIIRS               30.1727
    75: timer stats: READ_VIIRS               30.1727
    66: timer stats: READ_VIIRS               30.1727
    72: timer stats: READ_VIIRS               30.1729
    67: timer stats: READ_VIIRS               30.1726
    74: timer stats: READ_VIIRS               30.1729
    68: timer stats: READ_VIIRS               30.1726
    73: timer stats: READ_VIIRS               30.1730
    69: timer stats: READ_VIIRS               30.1727
    70: timer stats: READ_VIIRS               30.1727
    64: timer stats: READ_VIIRS               30.1731
    92: timer stats: READ_VIIRS               30.5467
    89: timer stats: READ_VIIRS               30.5468
    90: timer stats: READ_VIIRS               30.5468
    91: timer stats: READ_VIIRS               30.5468
    93: timer stats: READ_VIIRS               30.5468
    94: timer stats: READ_VIIRS               30.5467
    95: timer stats: READ_VIIRS               30.5469
    84: timer stats: READ_VIIRS               30.5470
    85: timer stats: READ_VIIRS               30.5469
    86: timer stats: READ_VIIRS               30.5469
    87: timer stats: READ_VIIRS               30.5469
    88: timer stats: READ_VIIRS               30.5468
    81: timer stats: READ_VIIRS               30.5471
    82: timer stats: READ_VIIRS               30.5470
    83: timer stats: READ_VIIRS               30.5470
    80: timer stats: READ_VIIRS               30.5475
    93: timer stats: READ_OZONE              0.115166
    94: timer stats: READ_OZONE              0.517453
    85: timer stats: READ_OZONE              0.587360
    89: timer stats: READ_OZONE              0.649517
    81: timer stats: READ_BUFRTOVS           0.829050
    75: timer stats: READ_PREPBUFR            1.47294
    79: timer stats: READ_NSSTBUFR            1.90115
    65: timer stats: READ_BUFRTOVS            2.64807
    64: timer stats: READ_BUFRTOVS            2.64672
    69: timer stats: READ_BUFRTOVS            2.66557
    68: timer stats: READ_BUFRTOVS            2.66498
    67: timer stats: READ_BUFRTOVS            2.81424
    66: timer stats: READ_BUFRTOVS            2.79706
    82: timer stats: READ_BUFRTOVS            2.49398
    86: timer stats: READ_BUFRTOVS            2.60691
    44: timer stats: READ_AVHRR               10.2207
    45: timer stats: READ_AVHRR               10.2207
    46: timer stats: READ_AVHRR               10.2207
    47: timer stats: READ_AVHRR               10.2208
    41: timer stats: READ_AVHRR               10.2206
    42: timer stats: READ_AVHRR               10.2206
    43: timer stats: READ_AVHRR               10.2206
    40: timer stats: READ_AVHRR               10.2208
    36: timer stats: READ_AVHRR               11.0368
    35: timer stats: READ_AVHRR               11.0369
    38: timer stats: READ_AVHRR               11.0368
    34: timer stats: READ_AVHRR               11.0372
    39: timer stats: READ_AVHRR               11.0368
    37: timer stats: READ_AVHRR               11.0369
    33: timer stats: READ_AVHRR               11.0373
    32: timer stats: READ_AVHRR               11.0363
    80: timer stats: READ_GPS                 3.62201
    88: timer stats: READ_BUFRTOVS            3.62867
    74: timer stats: READ_PREPBUFR            4.07518
    72: timer stats: READ_PREPBUFR            4.60147
    77: timer stats: READ_PREPBUFR            5.14285
    15: timer stats: READ_CRIS                21.6790
    13: timer stats: READ_CRIS                21.6790
     6: timer stats: READ_CRIS                21.6788
    14: timer stats: READ_CRIS                21.6790
     7: timer stats: READ_CRIS                21.6788
    12: timer stats: READ_CRIS                21.6789
     8: timer stats: READ_CRIS                21.6789
     9: timer stats: READ_CRIS                21.6788
    10: timer stats: READ_CRIS                21.6789
    11: timer stats: READ_CRIS                21.6789
     1: timer stats: READ_CRIS                21.6788
     2: timer stats: READ_CRIS                21.6788
     3: timer stats: READ_CRIS                21.6788
     4: timer stats: READ_CRIS                21.6788
     5: timer stats: READ_CRIS                21.6788
     0: timer stats: READ_CRIS                21.6780
     1: timer stats: READ_OZONE              0.129847
     7: timer stats: READ_OZONE              0.651767
    73: timer stats: READ_PREPBUFR            8.52165
    71: timer stats: READ_PREPBUFR            8.64337
    70: timer stats: READ_PREPBUFR            8.68001
     3: timer stats: READ_BUFRTOVS            2.54108
    60: timer stats: READ_ABI                 26.8622
    59: timer stats: READ_ABI                 26.8622
    62: timer stats: READ_ABI                 26.8623
    57: timer stats: READ_ABI                 26.8621
    61: timer stats: READ_ABI                 26.8623
    58: timer stats: READ_ABI                 26.8622
    63: timer stats: READ_ABI                 26.8623
    56: timer stats: READ_ABI                 26.8621
    49: timer stats: READ_ABI                 27.4099
    50: timer stats: READ_ABI                 27.4099
    51: timer stats: READ_ABI                 27.4099
    52: timer stats: READ_ABI                 27.4099
    53: timer stats: READ_ABI                 27.4099
    54: timer stats: READ_ABI                 27.4100
    55: timer stats: READ_ABI                 27.4100
    48: timer stats: READ_ABI                 27.4071
     4: timer stats: READ_AHI                 6.75684
    76: timer stats: READ_PREPBUFR            14.2980
     0: timer stats: READ_OZONE               10.9982
     2: timer stats: READ_OZONE               11.0451
     6: timer stats: READ_OZONE               11.4480
    91: timer stats: READ_ATMS                33.2561
    90: timer stats: READ_ATMS                33.5066
    84: timer stats: READ_SSMIS               36.9816
     5: timer stats: READ_ATMS                33.2350
    83: timer stats: READ_SSMIS               44.8104
    92: timer stats: READ_GMI                 76.9536
    78: timer stats: READ_SATWND              116.393

Comparison of the wall times for the same observation type show that Orion times are 3 to 4 times greater than Hercules. This increase is unexpected.

@RussTreadon-NOAA
Copy link
Contributor Author

As a test compile same GSI code on Hera and Dogwood (WCOSS2) and run gsi.x using global_4denvar ctest. Processing of the satellite wind bufr file, read_satwnd, took the most time out of all the bufr readers. Comparison of timings across the machines show timings for Hercules, Hera, and Dogwood are comparable. Orion is the outlier

machine total read_obs time (s) read_satwnd time (s) total gsi.x wall time (s)
dogwood 169.988 120.264 403.800888
hera 194.612 137.553 466.024682
hercules 164.571 116.393 444.405860
orion 608.372 435.489 964.591370

The same comment has been added to spack-stack issue #1166

@RussTreadon-NOAA
Copy link
Contributor Author

debufr test

Use bufr utility debufr to decode 5 gdas bufr dump files from the global_4denvar ctests. debufr was run on WCOSS2 (Dogwood), Hera, Hercules, and Orion. debufr timings are 3x to 4x greater on Orion than other machines. Open NCEPLIBS-bufr issue #608 to document this behavior.

@DavidHuber-NOAA
Copy link
Collaborator

@AlexanderRichert-NOAA suggested compiling the GSI and ENKF on Hercules then running the ctests on Orion with those executables. After compiling, the gsi_common.lua was modified to point to the Hercules BUFR installation, then ctests were run. Runtimes were largely the same as reported by @RussTreadon-NOAA:

runtime.global_4denvar_hiproc_contrl.txt: The total amount of wall time = 1394.165947
runtime.global_4denvar_hiproc_updat.txt: The total amount of wall time = 751.449358
runtime.global_4denvar_loproc_contrl.txt: The total amount of wall time = 1059.320163
runtime.global_4denvar_loproc_updat.txt: The total amount of wall time = 986.416513
runtime.global_enkf_hiproc_contrl.txt: The total amount of wall time = 159.291133
runtime.global_enkf_hiproc_updat.txt: The total amount of wall time = 160.180497
runtime.global_enkf_loproc_contrl.txt: The total amount of wall time = 204.069604
runtime.global_enkf_loproc_updat.txt: The total amount of wall time = 186.185513
runtime.hafs_3denvar_hybens_hiproc_contrl.txt: The total amount of wall time = 635.210342
runtime.hafs_3denvar_hybens_hiproc_updat.txt: The total amount of wall time = 625.259733
runtime.hafs_3denvar_hybens_loproc_contrl.txt: The total amount of wall time = 719.252038
runtime.hafs_3denvar_hybens_loproc_updat.txt: The total amount of wall time = 664.469968
runtime.hafs_4denvar_glbens_hiproc_contrl.txt: The total amount of wall time = 686.031241
runtime.hafs_4denvar_glbens_hiproc_updat.txt: The total amount of wall time = 677.697564
runtime.hafs_4denvar_glbens_loproc_contrl.txt: The total amount of wall time = 804.612061
runtime.hafs_4denvar_glbens_loproc_updat.txt: The total amount of wall time = 738.414877
runtime.rtma_hiproc_contrl.txt: The total amount of wall time = 354.441046
runtime.rtma_hiproc_updat.txt: The total amount of wall time = 351.002411
runtime.rtma_loproc_contrl.txt: The total amount of wall time = 362.241694
runtime.rtma_loproc_updat.txt: The total amount of wall time = 366.629561

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Jul 12, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Jul 25, 2024

Install GSI develop at 3e27bb8 on Orion following the 7/24/2024 maintenance. Run ctests with following results

Test project /work/noaa/da/rtreadon/git/gsi/develop/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #6: global_enkf ......................   Passed  727.82 sec
2/6 Test #2: rtma .............................   Passed  1687.17 sec
3/6 Test #5: hafs_3denvar_hybens ..............   Passed  2720.38 sec
4/6 Test #4: hafs_4denvar_glbens ..............   Passed  2960.38 sec
5/6 Test #1: global_4denvar ...................   Passed  3601.74 sec

No timing is provided for rrfs_3denvar_rdasens because the rrfs_3denvar_rdasens_loproc_updat job hung.

For this ctests which did complete there is no improvement in gsi.x run times on Orion. Below are the gsi.x wall times for each job in the ctest suite

tmpreg_global_4denvar/global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 733.965163
tmpreg_global_4denvar/global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 726.510175
tmpreg_global_4denvar/global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 953.773947
tmpreg_global_4denvar/global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 952.829178
tmpreg_global_enkf/global_enkf_hiproc_contrl/stdout:The total amount of wall time                        = 136.242908
tmpreg_global_enkf/global_enkf_hiproc_updat/stdout:The total amount of wall time                        = 137.333738
tmpreg_global_enkf/global_enkf_loproc_contrl/stdout:The total amount of wall time                        = 146.321676
tmpreg_global_enkf/global_enkf_loproc_updat/stdout:The total amount of wall time                        = 146.708281
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_hiproc_contrl/stdout:The total amount of wall time                        = 616.280378
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_hiproc_updat/stdout:The total amount of wall time                        = 616.914167
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_loproc_contrl/stdout:The total amount of wall time                        = 648.031774
tmpreg_hafs_3denvar_hybens/hafs_3denvar_hybens_loproc_updat/stdout:The total amount of wall time                        = 658.915019
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_hiproc_contrl/stdout:The total amount of wall time                        = 662.856655
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_hiproc_updat/stdout:The total amount of wall time                        = 663.235348
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_loproc_contrl/stdout:The total amount of wall time                        = 692.796399
tmpreg_hafs_4denvar_glbens/hafs_4denvar_glbens_loproc_updat/stdout:The total amount of wall time                        = 717.509622
tmpreg_rtma/rtma_hiproc_contrl/stdout:The total amount of wall time                        = 354.508135
tmpreg_rtma/rtma_hiproc_updat/stdout:The total amount of wall time                        = 359.644194
tmpreg_rtma/rtma_loproc_contrl/stdout:The total amount of wall time                        = 361.248421
tmpreg_rtma/rtma_loproc_updat/stdout:The total amount of wall time                        = 364.354276

@DavidHuber-NOAA DavidHuber-NOAA self-assigned this Aug 6, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

Orion and Hercules ctests

Install and build GSI develop at 8a1c0e7 on Orion and Hercules following 8/28/2024 maintenance. Run ctests in /work2/noaa//stmp/rtreadon with following results

Orion

Test project /work/noaa/da/rtreadon/git/gsi/develop/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_rdasens
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 6: global_enkf
1/6 Test #6: global_enkf ......................   Passed  885.57 sec
2/6 Test #2: rtma .............................   Passed  1688.60 sec
3/6 Test #5: hafs_3denvar_hybens ..............   Passed  2781.45 sec
4/6 Test #4: hafs_4denvar_glbens ..............   Passed  3018.51 sec
5/6 Test #1: global_4denvar ...................   Passed  3782.75 sec

The rrfs_3denvar_rdasens test did not complete within the specified 3 hours, 15 minute wall clock time. The rrfs_3denvar_radsens ctest hangs. Tagging @TingLei-NOAA , @ShunLiu-NOAA , and @hu5970 for awareness.

Hercules

Test project /work/noaa/da/rtreadon/git/gsi/develop_hercules/build
    Start 3: rrfs_3denvar_rdasens
    Start 1: global_4denvar
    Start 4: hafs_4denvar_glbens
    Start 5: hafs_3denvar_hybens
    Start 2: rtma
    Start 6: global_enkf
1/6 Test #6: global_enkf ......................   Passed  773.04 sec
2/6 Test #3: rrfs_3denvar_rdasens .............***Failed  858.14 sec
3/6 Test #2: rtma .............................   Passed  1029.06 sec
4/6 Test #5: hafs_3denvar_hybens ..............   Passed  1395.64 sec
5/6 Test #4: hafs_4denvar_glbens ..............   Passed  1766.50 sec
6/6 Test #1: global_4denvar ...................   Passed  2283.92 sec

83% tests passed, 1 tests failed out of 6

Total Test time (real) = 2284.14 sec

The following tests FAILED:
          3 - rrfs_3denvar_rdasens (Failed)

The rrfs_3denvar_rdasens` test failed due to

The memory for rrfs_3denvar_rdasens_loproc_updat is 1126876 KBs.  This has exceeded maximum allowable memory of 1095648 KBs,
resulting in Failure memthresh of the regression test.

This is not a fatal fail.

The Orion gsi.x wall times for the rtma, hafs and global_4denvar ctests remain considerably higher than Hercules. For example, below are the Orion and Hercules gsi.x wall times from the global_4denvar ctest
Orion

develop/tmpreg_global_4denvar/global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 759.226662
develop/tmpreg_global_4denvar/global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 757.392611
develop/tmpreg_global_4denvar/global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 991.373930
develop/tmpreg_global_4denvar/global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 988.247462

Hercules

develop_hercules/tmpreg_global_4denvar/global_4denvar_hiproc_contrl/stdout:The total amount of wall time                        = 343.450568
develop_hercules/tmpreg_global_4denvar/global_4denvar_hiproc_updat/stdout:The total amount of wall time                        = 341.875771
develop_hercules/tmpreg_global_4denvar/global_4denvar_loproc_contrl/stdout:The total amount of wall time                        = 456.783184
develop_hercules/tmpreg_global_4denvar/global_4denvar_loproc_updat/stdout:The total amount of wall time                        = 478.011797

The Orion wall times about 2x greater (slower) than Hercules.

The run time performance of gsi.x remains unchanged following the 8/28/2024 MSU maintenance.

@TingLei-NOAA
Copy link
Contributor

TingLei-NOAA commented Aug 29, 2024

@RussTreadon-NOAA Thank you!
That is because the issue #766, which is under investigation by experts of hera/orion helpdesk
I had a off-line discussion with Ming and we think a fix/work-around is to let the mpi layout for rdas ctest on orion is similar to Ming's setup on hera/herculs , namely, in regression_param.sh.

 elif [[ "$machine" = "Orion" ]]; then
           topts[1]="0:15:00" ; popts[1]="5/4/" ; ropts[1]="/1"
           topts[2]="0:15:00" ; popts[2]="10/4/" ; ropts[2]="/2"

could be changed to (for popts)

 elif [[ "$machine" = "Orion" ]]; then
           topts[1]="0:15:00" ; popts[1]="40/3/" ; ropts[1]="/1"
           topts[2]="0:15:00" ; popts[2]="40/5/" ; ropts[2]="/2"

Is that ok for you to make this change or I should open another PR for this?

@RussTreadon-NOAA
Copy link
Contributor Author

@TingLei-NOAA , please follow the procedure outlined under GSI wiki entry How-to-Make-Changes to submit changes to GSI develop.

I modified the Orion job configuration for rrfs_3denvar_rdasens as you indicate. Note that

          topts[2]="0:15:00" ; popts[2]="40/5/" ; ropts[2]="/2"

will not work on Orion. The above oversubscribes the node since we are requesting 40 tasks per node with 2 threads per task. Orion nodes only have 40 slots. Attempts to run with the above settings result in a gsi.x abort with the following message written to stdout

srun: error: Unable to create step for job 18542292: More processors requested than permitted

Change the above job configuration to

           topts[2]="3:15:00" ; popts[2]="20/5/" ; ropts[2]="/2"

With this change both the loproc and hiproc jobs run to completion on Orion.

Test project /work/noaa/da/rtreadon/git/gsi/develop/build
    Start 3: rrfs_3denvar_rdasens
1/1 Test #3: rrfs_3denvar_rdasens .............   Passed  976.47 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 976.50 sec

The Hercules test ran much faster

Test project /work/noaa/da/rtreadon/git/gsi/develop_hercules/build
    Start 3: rrfs_3denvar_rdasens
1/1 Test #3: rrfs_3denvar_rdasens .............   Passed  495.98 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 496.00 sec

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Sorry for my oversight in my previous suggestions.
I will open an PR and add the changes you made as soon as possible.

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidHuber-NOAA , do we have any updates on this issue?

@DavidHuber-NOAA
Copy link
Collaborator

About a month ago, I spoke with Raghu about this issue and he was going to see what he could do. I will message him about it today and see if he has any updates.

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @DavidHuber-NOAA. It would be nice to eventually get to the bottom of this.

@RussTreadon-NOAA
Copy link
Contributor Author

MSU admins confirm slower executable run time, especially for apps processing bufr data. No reason for the slowdown identified. MSU ticket closed. Therefore, closing this issue, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants