Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRTM-fix files do not match between Orion and Hercules #1165

Closed
DavidHuber-NOAA opened this issue Jun 26, 2024 · 11 comments
Closed

CRTM-fix files do not match between Orion and Hercules #1165

DavidHuber-NOAA opened this issue Jun 26, 2024 · 11 comments
Assignees
Labels
bug Something is not working

Comments

@DavidHuber-NOAA
Copy link
Collaborator

DavidHuber-NOAA commented Jun 26, 2024

Describe the bug
A total of 445 fix files differ between Orion and Hercules under spack-stack v1.6.0. I have not looked at other machines to find out which is correct.

To Reproduce

> for file in /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-qls55kd/fix/*; do
>   f=$(basename $file)
>   cmp $file /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-2os2hw2/fix/$f
> done

Expected behavior
The CRTM fix files would match on all systems, with those hosted on WCOSS2 under /apps/ops/prod/libs/intel/19.1.3.304/crtm/2.4.0.1/fix being the standard.

System:
Orion and Hercules, possibly others

Additional context
Found while testing the GSI NOAA-EMC/GSI#754.

@DavidHuber-NOAA DavidHuber-NOAA added the bug Something is not working label Jun 26, 2024
@RussTreadon-NOAA
Copy link

Perform the following test

  • load machine specific GSI modulefiles on Dogwood, Hera, Hercules, and Orion. The following crtm modules were loaded
    • Dogwood: crtm/2.4.0.1
    • Hera: crtm-fix/2.4.0.1_emc
    • Hercules: crtm-fix/2.4.0.1_emc
    • Orion: crtm-fix/2.4.0.1_emc
  • The above modules define the following CRTM_FIX
    • Dogwood: /apps/ops/prod/libs/intel/19.1.3.304/crtm/2.4.0.1/fix
    • Hera: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/intel/2021.5.0/crtm-fix-2.4.0.1_emc-bm46d3q/fix
    • Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-2os2hw2/fix
    • Orion: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-qls55kd/fix
  • rsync the CRTM_FIX defined by each module to machine specific directories in Orion /work2/noaa/stmp/rtreadon/crtm_fix
  • compare Hera, Hercules, and Orion crtm fix files with respect to WCOSS2 crtm fix files
    • WCOSS2 (Dogwood) crtm fix contains 1550 files
    • Hera crtm fix contains 1706 files
    • Hercules crtm fix contains 1706 files
    • Orion crtm fix contains 1705 files
  • execute diff -r wcoss2/fix $machine/*/fix for each machine
    • WCOSS2 -vs- Hera shows 158 diffs. These diffs are 157 files only in Hera fix and 1 file only in WCOSS2 fix. The one file in WCOSS2 not in Hera is amsua_metop-c.SpcCoeff.noACC.bin.
    • WCOSS2 -vs Hercules shows 601 diffs. Of these, 156 are files only in one fix or the other. 445 files differ. For example
orion-login-2:/work2/noaa/stmp/rtreadon/crtm_fix$ ls -l wcoss2/fix/abi_gr.TauCoeff.bin hercules/crtm-fix-2.4.0.1_emc-2os2hw2/fix/abi_gr.TauCoeff.bin
-rw-r--r-- 1 rtreadon stmp  10972 Dec 20  2023 hercules/crtm-fix-2.4.0.1_emc-2os2hw2/fix/abi_gr.TauCoeff.bin
-rw-r--r-- 1 rtreadon stmp 184588 Nov 13  2023 wcoss2/fix/abi_gr.TauCoeff.bin
  • WCOSS2 -vs Orion shows 157 diffs. These diffs are all for files only in one set of fix files or the other. 1 file, amsua_metop-c.SpcCoeff.noACC.bin, is only in WCOSS2 fix. 156 files are only in Orion fix.
  • Hera -vs- Orion shows all common files to be identical. The Hera fix contains one file, amsua_metop-c.SpcCoeff.bin-old, not found on Orion.

Summary

  1. The common files in WCOSS2, Hera, and Orion are identical. Hera and Orion have 156 coefficient files not found on WCOSS2. WCOSS2 has one coefficient file not found on Hera or Orion.
  2. The Hera and Orion CRTM_FIX are identical with the exception of an extra file in the Hera fix.
  3. Hercules CRTM_FIX contains 445 files which differ from WCOSS2, Hera, and Orion. Hercules CRTM_FIX is the outlier.

@AlexanderRichert-NOAA
Copy link
Collaborator

@Hang-Lei-NOAA can you provide any insight into the WCOSS2 installation process for crtm 2.4.0.1?

@Hang-Lei-NOAA
Copy link
Collaborator

Hang-Lei-NOAA commented Jun 26, 2024

Thanks for Russ's comparation on these fix files.
First of all, the management of fix files is a very difficult task for overall code manager Ben Johnson, since many agencies used the crtm and corresponding specific fix files. We have to know/operate appropriate fix files for EMC.

The previous crtm/2.4.0 fix files was prepared by Russ. Further added into the hpc-stack by me. Installed by Kyle and I on all noaa machines. We used the same code. So, it is trouble free.

But for crtm/2.4.0.1, we had several changes in fix files. The final changes were lead by Andrew Collard. Upon Andrew's testing, Ben released several times for emc. It was finally settled on the wcoss2 versions.

So, multiple release/changes could be the problem. EPIC installers used the spack-stack (which is rapidly changed in development). If they used a different version, or the installer did not update their spack-stack, the difference will occur. Other I can think of for fix files is that if the installer add a new version on the same location of an existing version. Some files may be different or extra. What I did on wcoss2 is to totally removed old installations by only using Andrew's final tarball of fix files.

Andrew and I have emailed EPIC last year to push the EPIC installing the new version.
Besides many emails to Jong and Natalie, the info is also included in the ticket
#901 (comment)

Very important for us is to make sure that EMC required fix files are there. Then consider unifying steps in installations.

@climbfuji
Copy link
Collaborator

I think most important (not only for us, but most users) is that the spack recipe for crtm-fix is delivering the correct set of files.

@climbfuji
Copy link
Collaborator

What's the status for this issue? Has it been fixed?

@RussTreadon-NOAA
Copy link

Thank you @climbfuji for asking about the status of this issue. The problem remains.

I ran the GSI global_4denvar ctest on Hercules and Orion. The initial radiance penalties still differ. Differences are due to different CRTM coefficients. The global_4denvar ctest uses CRTM coefficients from crtm-fix/2.4.0.1_emc.

On Hercules, crtm-fix/2.4.0.1_emc has

setenv("CRTM_FIX","/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-2os2hw2/fix")

On Orion, crtm-fix/2.4.0.1_emc has

setenv("CRTM_FIX","/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-qls55kd/fix")

A diff -r of these two CRTM_FIX shows that 445 binary files differ. This is not correct. The CRTM coefficients that module crtm-fix/2.4.0.1_emc points at should be identical.

@climbfuji
Copy link
Collaborator

Orion has the correct set of files, Hercules does not.

@climbfuji
Copy link
Collaborator

  1. Copy files from Orion to Hercules
  2. Make sure that the files from a spack-stack develop install are correct (check spack source mirrors?)

Note WCOSS2 fix files are different from all the other platforms (except Hercules) - why? Which set is correct? WCOSS2 or Orion? @DavidHuber-NOAA says what is on WCOSS2 should be considered the authoritative set.

#1165 (comment)

@climbfuji
Copy link
Collaborator

@DavidHuber-NOAA @RussTreadon-NOAA @ulmononian @AlexanderRichert-NOAA I ran this command as discussed yesterday:

rsync -av /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-qls55kd/fix/ /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-2os2hw2/fix/

Note that this is a one-off, manual bug fix on Hercules. I am not making any attempt fixing this elsewhere. Someone from EMC needs to make sure that the CRTM fix files installed with spack-stack are the correct files for ALL of the spack-stack releases that we are currently maintaining. Ideally that includes addressing the difference with WCOSS2, which I understand is considered to be the authoritative version.

Feel free to keep this open until that's been done, or close it and create another issue to fix it in the submitted code if necessary.

@RussTreadon-NOAA
Copy link

Thank you @climbfuji for the manual fix of the Hercules CRTM coefficients.

I ran GSI ctest global_4denvar on Hercules. I can confirm that the CRTM coefficients used in the Hercules global_4denvar test are bitwise identical with those used in the Orion global_4denvar test. The initial total radiance penalty from the Hercules run is identical to the initial total radiance penalty from the Orion run. In fact, the Hercules and Orion analysis files are bitwise identical.

@DavidHuber-NOAA
Copy link
Collaborator Author

This has been resolved. Thanks @climbfuji. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
No open projects
Development

No branches or pull requests

6 participants