Skip to content

Run experiments on Aaditya

Cheng Da edited this page Jan 22, 2021 · 21 revisions

Structure of Aaditya computing nodes

Each Aaditya computing node has 16 cores, and 64 GB RAM (60GB usable). The directory /tmp/ for each computing node is 500MB so don't write out temporary files into /tmp/ during the experiments.

Connect to Aaditya

Set VPN

Please refer to the instructions from the mail sent by the Aaditya staff. You can find the username and password from the Eugenia's email "Re: VPN needed for us to access Aditya".

Set account for the first time

  1. After enabling VPN, connect Aaditya with your account and password provided by the Aaditya staff.
  2. Set the new password following the instructions given by Aaditya.
  3. Login into management node: ssh mn1 with the password provided by the Aaditya staff.
  4. Change the password, and set it as the same as your new password in Step 2.
  5. Wait a few minutes for the system to update this new password to all computing nodes.
  6. If changing the password in the future, log into mn1 to make modifications and the change will be applied automatically to all nodes.

Set github

We can connect to GitHub through SSH. Please refer to https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/connecting-to-github-with-ssh.

Note: When generating the SSH key, don't forget to set additional password!!!

Compile CFS-LETKF

Compile ESMF

  1. Download ESMF_3_1_0rp5 from https://earthsystemmodeling.org/static/releases.html.
  2. On Aaditya, load the Intel compiler 2014 module load intel/2014. Make sure the version of mpiifort is consistent with your ifort: which mpiifort, which fort
  3. Set two environmental variables:
export ESMF_DIR=/backup2/your_usrname/your_preferred_directory
export ESMF_COMM="intelmpi"
  1. make
  2. make install
  3. make installcheck

Compile CFS-LETKF

  1. get clone of CFS-LETKF from [email protected]:UMD-AOSC/CFSv2-LETKF.git
git clone [email protected]:UMD-AOSC/CFSv2-LETKF.git
  1. check out the branch Aaditya:
git checkout Aaditya

If switched successfully, you should see lsf.py, lsf_cycle under directory /run.

  1. Modify var ESMFDIR in the file config/makefile.Aaditya_intel2014.mk. Make sure MACHINE=Aaditya in the file letkf-mom/config/machine.sh. Then
ln -s makefile.Aaditya_intel2014.mk makefile.mk
ln -s setenv.Aaditya_intel2014.sh setenv.sh

Run the command source config/setenv.sh and then start compiling by following the compiling instructions at https://github.com/UMD-AOSC/CFSv2-LETKF/wiki/Compiling with.

Run DA cycles

Modifications made for Aaditya

The branch Aaditya has two main difference from the branch develop:

  1. Aaditya uses LSF instead of SLURM.
  2. The computing nodes of Aaditya has small temporary space.

Based on these two situations, two solutions are developed:

  1. lsf.py, and lsf_cycle are created. They are similar to slurm.py, and slurm_cycle.
  2. Add option to disable TMP_DIR_LOCAL, and use shared directory TMP_DIR_SHARED only.

Note for lsf.py, the function of banning nodes is not finished yet. So do not use it.

Set vars in setenv.Aaditya_intel2014.sh

set properly the vars TMP_DIR_LOCAL, TMP_DIR_SHARED, USE_TMP_DIR_LOCAL=0, FIX_DIR_AM, FIX_DIR_OM, OBS_ATM

Fixed files for CFS-LETKF

Please copy the following files to your own directory:

  1. Fixed files for CFS: /backup2/cheng/shared/CFSv2-LETKF.data
  2. Decoded PrepBUFR obs from 2006060100 to 2006080118: /backup2/cheng/shared/prepbufr
  3. Ocean profiles from:
  4. NOAA L4 SST from:
  5. Compressed sample 40-member initial condition of 2006060100 from: /backup2/cheng/shared/backup_compressed

Test case

  1. You can use the new script init_copy under /run to generate the required file structure for CFS-LETKF running. For example,
./init_copy --source /backup2/cheng/CFSv2-LETKF/DATA/backup --ares 126 2006060100 /backup2/cheng/CFSv2-LETKF/DATA/mem5 5
  1. Start 1-day experiment: go to directory /run, then run the command
./lsf_cycle --aobs /backup2/cheng/CFSv2-LETKF/DATA/obs/atm_prepbufr --ares 126 --clear 24 --mem 40 /backup2/cheng/CFSv2-LETKF/DATA/mem40 2006060100 2006060200
  1. After finishing the experiment, check the controller log file to see if there is any error. Then check the ATM-LETKF log file for each cycle to see if observations are correctly assimilated.