Skip to content

Building GEOS with latest modifications to GOCART for the Carbon group

Sourish Basu edited this page Oct 22, 2024 · 26 revisions

Checking out and building the model

Setting up GitHub credentials

Checking out GEOS

Much of this is already documented here, only the essential steps are replicated here for convenience.

Setting up the environment

First, you need to have the correct modules to check out the code. If you are expert user, make sure you have git and mepo in your path. If you're not, execute the following to get them (as well as anything else you might need).

module purge
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module load GEOSenv

Getting the code

Then, decide where you want to set up the model code. While you can check it out in your home directory, we recommend checking it out on a scratch space because it can get pretty big, especially when you are testing multiple different model versions. On NCCS, you typically want to use /discover/nobackup/${USER}. Somewhere in that folder, check out the model with

git clone -b v11.5.2 [email protected]:GEOS-ESM/GEOSgcm.git GEOSgcm-v11.5.2

Note that 11.5.2 simply happens to be the latest released tag at the time this is being written. There is nothing sacred about that tag. If you want a later tag, you can find all release versions here.

The model consists of code in several sub-repositories, by default none of which are checked out. There is a file called components.yaml in the source tree you just checked out, which contains the tags for each repository that will be checked out. Save a copy of this file somewhere, say as components.yaml.orig. Then add the following block to check out RRG:

RRG:
  local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSphysics_GridComp/@GEOSchem_GridComp/@RRG
  remote: ../RRG.git
  branch: main
  develop: develop

In addition, both GEOSchem_GridComp and GOCART will need to be changed to branches that contain the latest GOCART code. In components.yaml, change the tag line following GOCART to branch: feature/sbasu1/gocart+11.5.2, and the tag line following GEOSchem_GridComp to branch: feature/sbasu1/gocart+11.5.2. Note that these are different repositories, although the branch names are identical.

Now issue mepo clone at the command line to check out all the repositories at the branches/tags in components.yaml.

Building the code

It is best to build the code on a compute node of the same architecture as the ones you will be running the model on. For this example we will be building and running on AMD Milan nodes, so get a terminal on such a compute node with

salloc --nodes=1 --constraint=mil -t 60 -A s1460 --qos debug

This gets you a terminal on a Milan node under the debug queue, which is pretty fast but has a wall clock limit of 1 hour. You could, alternatively, issue this command first thing in the morning with -t 480 and get a node for 8 hours. You will need to wait longer to get a node, but once you do, you're set for a day's worth of building and debugging.

Once you get on a compute node, go to the folder where you checked out the source tree, and just to be safe create a clean environment as follows:

module purge
cd @env
source g5_modules.sh
cd ..

Now you're ready to build. Since you're already on a compute node, no need to submit a parallel build job. Instead, issue the following commands in order:

mkdir build
cd build
cmake .. -DBASEDIR=$BASEDIR/Linux -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install
make -j install

This builds the model into ../install, specifically the GEOS GCM executable is ../install/bin/GEOSgcm.x.

Important: When you run gcm_setup to set up a new run, this executable is copied over to the run directory. As a result, if you want to fix something in code and recompile, the changes will not be seen in your run unless you copy over the executable again. Therefore, I often symlink install/bin/GEOSgcm.x from my run directory.

Setting up a run

Use gcm_setup to install the model somewhere

Go into install/bin and execute ./gcm_setup.

  • Experiment ID is any name you want to give the run. It's a good idea to include the model version and something about which tracers you are running in a short name. Mostly something that you will remember. If you call it (say) Apple it's pretty much guaranteed that you won't remember what it is for two years down the line. I'm calling mine GCM-11.5.2-methane-c180.
  • Experiment Description is a short description to help you remember.
  • CLONE is the ability of a model to copy over someone else's run folder. This is a very useful ability, but for now let's choose NO.
  • Atmospheric Horizontal Resolution depends on what you want to run. I'm choosing c180. It's perfectly fine to choose c90 for model development.
  • Default Vertical Resolution of 72 layers is fine
  • Default Microphysics of BACM_1M is fine
  • Default TRUE for Hydrostatic Atmosphere is fine
  • Use IOSERVER if you're running c180 or higher
  • Default processor type of mil is fine
  • Default NO to COUPLED Ocean/Sea-Ice Model is fine
  • Choose CS (cubed sphere) for Data_Ocean Horizontal Resolution
  • Default choice Icarus-NLv3 for land surface boundary conditions is fine
  • Default choice Catchment for land surface model is fine
  • Accept default choice to run GOCART with Actual aerosols
  • Choose to use OPS emission files for GOCART, because the AMIP emission files do not exist for recent years
  • For c180, a HEARTBEAT_DT of 450 is fine
  • Don't worry about the HISTORY template, you are going to change the history file anyway
  • The HOME Directory is where the run folder will be created. Just make sure it's created somewhere inside /discover/nobackup/projects/gmao/geos_carb/${USER}
  • In theory EXPERIMENT Directory can be different from HOME Directory, but no one has ever tried it. Either set it to be the same, or try at your own risk and don't expect any sympathy if you break something.
  • The Build directory should already be correct
  • Our GROUP ID is s1460

TMPDIR issue on discover

Every so often gcm_setup will fail with errors like

/tmp/tmp.sVmzWQGKy5: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.VXxsGkzxBY: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.kAg9MZYON8: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.Xced0YAEi2: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.igPE8a3dYw: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.OL9WF9FmLj: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.73LABTUPJy: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.rdwOgZdlbP: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.Qak33WKR9E: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.vDLxsZWrwJ: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
cat: /discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl.tmp: No such file or directory

For some unknown reason, /tmp on discover acts up with denied permissions. Probably because it's mounted with noexec. To solve, do

export TMPDIR=/discover/nobackup/$USER/tmp
mkdir -p $TMPDIR

before executing gcm_setup.

Create restart files

This is a dark art. Remembering Robert the Bruce before embarking on this endeavor would be well advised.

GEOS restart files are called *_rst, even though they're really netcdf files. Ours not to reason why, ours but to do and die. You will see some *_import_rst and some *_internal_rst. Ignore the first kind, you will only need to supply the second kind for a new run. There are two types of *_internal_rst restart files, upper air (3D) restarts and surface (2D) restarts. Upper air restarts are defined on the cube, contains variables with shape levels x N x 6N, and are fairly easily created by the provided scripts for creating/remapping restarts (more below). There are very few ways in which these can "go wrong". Surface restarts can also be created by the provided remapping scripts. However, these will very likely make you weep. Instead of being on grids, surface restarts are provided as a list of tiles (my theory is that whoever made that decision was trying to save disk space and reinvented the wheel instead of relying on compression algorithms). Every single land model has a different ordering of these tiles, and understanding what your land model is requires a fair amount of expert knowledge. Worse, the choice of a land model makes pretty much zero difference in a replay run, yet your model will crash unless you do this correctly. In momemnts of frustration, remember Robert the Bruce.

Creating restarts using GEOS-provided scripts

The script to create restarts is called install/bin/remap_restarts.py. Do not run this on a compute node because it requires access to some filesystems that are not mounted on compute nodes. On a front-end node, run it as follows:

module purge
source @env/g5_modules.sh
install/bin/remap_restarts.py

This will present you with a series of questions, answer as follows.

  • Remap archived MERRA-2 restarts? Yes
  • Enter restart date/time: Enter YYYYMMDDHH, where HH is one of 03, 09, 15 or 21 for MERRA2
  • Enter output directory for new restarts: Make sure this is a unique folder which is not your run folder, you can later copy them over
  • Remap to a stretched cubed-sphere grid? No
  • Enter atmospheric grid for new restarts: Enter the same atmospheric resolution you entered for gcm_setup
  • Select ocean model for new restarts: data
  • Select data ocean grid/resolution for new restarts: CS
  • Enter number of atmospheric levels for new restarts: Choose what you chose for gcm_setup
  • Select boundary conditions (BCs) version for new restarts: This depends on what you chose for the land boundary condition in gcm_setup. If you chose Icarus-NLv3 there, choose NL3 here.
  • Land BCs for input restarts: You will be presented a folder choice, accept it
  • Select BCs base directory for new restarts: Select what you are given
  • Land BCs for output restarts: Select what you are given
  • Remap upper air restarts? Yes
  • Remap agcm_import_rst (a.k.a. IAU) file needed for REPLAY runs? No
  • Remap surface restarts? Yes
  • Remap bkg files? No
  • Write lcv file? No
  • Enter value of WEMIN. No idea what this is, just choose what you are given.
  • Enter value of zoom parameter for surface restarts [1-8]? No idea what this is, just choose what you are given.
  • Enter experiment ID for new restarts: Fine to leave this blank.
  • Add labels for BCs version and atm/ocean resolutions to restart file names? No
  • SLURM or PBS quality-of-service (qos)? debug
  • ('Select/enter SLURM or PBS account:\n',) s1460
  • ('Enter SLURM or PBS partition: (If desired; can leave blank.)\n',) Leave blank.

After entering all the questions, it will submit a job to the queue to regrid the restarts, and make you wait while it does, i.e., the sbatch command won't exit. Don't close the terminal or quit at this point, hopefully the debug queue will be quick enough. Once the job is done, you need to copy over the *_rst.nc4 files from the ouput folder (above) to your run directory and remove the extension .nc4.

Set up a minimal model to run just PCHEM

Set up emission files

Set up output history collection

Set up replay

The GEOS GCM is by default "free running", which means that it has no obligation to follow the real atmosphere. It is a dynamical model which will be driven by an initial condition, the Navier-Stokes equations, incoming solar radiation, and a few other boundary conditions. If you want it to have the winds that were actually observed, you will need to replay it to a meteorological reanalysis. The reanalysis knows about what happened in the past by virtue of weather data assimilation.

Enable replay in AGCM.rc by uncommenting one of the REPLAY_MODE keys. The most typical replay configuration you will use is "Regular" replay to the MERRA2 reanalysis. You have the choice of replaying to either 6-hourly snapshots at 3z, 9z, 15z and 21z, or 3 hourly averages spanning 0-3z, 3-6z, etc. To replay to 3-hourly averages, which is recommended, use the following settings in AGCM.rc:

ASSIMILATION_CYCLE: 10800
REPLAY_MODE: Regular
REPLAY_ANA_EXPID: MERRA-2
REPLAY_FILE: /discover/nobackup/projects/gmao/merra2/data/products/MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
REPLAY_FILE_FREQUENCY: 10800
REPLAY_FILE_REFERENCE_TIME: 013000

Disclaimer about the repository copy of gcm_run.j

The repository version of gcm_run.j as of October 16 2024 will not work with this. That is because that gcm_run.j expects two keys, REPLAY_ANA_LOCATION and REPLAY_FILE. The above would correspond to the pair

REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/merra2/data/products
REPLAY_FILE: MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4

When you run the model, gcm_run.j assumes that the first path component of REPLAY_FILE is a folder, and makes a symlink of that name inside scratch pointing to REPLAY_ANA_LOCATION, i.e., scratch/MERRA2_all points to /discover/nobackup/projects/gmao/merra2/data/products. So when GEOS runs, it is really reading scratch/MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4 after substituting all the date and time tokens. However, this mechanism will clearly not work with the following pair

REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/merra2/data/products/MERRA2_all
REPLAY_FILE: Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4

which, from the perspective of a normal human user used to filesystem logic, is equivalent to the key pair that works. Worse, GEOS doesn't actually need REPLAY_ANA_LOCATION, it only reads the key REPLAY_FILE, and is perfectly capable of handling long paths. Hence, in my gcm_run.j I have removed the entire mechanism of creating the aforementioned symlink (search for the conditional block if( $REPLAY_MODE == 'Exact' | $REPLAY_MODE == 'Regular' ) then and look at the lines commented within), and removed the key REPLAY_ANA_LOCATION in AGCM.rc.

Modify gcm_run.j if required

Errors that will keep coming back like bad pennies

Incorrect number of tiles in water restarts

Every so often, when you try to run a fresh model setup, you'll get an error such as

Error! Found 339967 tiles in openwater. Expect to find 359523 tiles.
Your restarts are probably for a different ocean.

This is probably because the water restarts you are using come from a run with a different choice of Land Surface Boundary Conditions. Look in your gcm_run.j, specifically setenv BCSDIR. Set this to whatever is in the run you copied the water restarts from. There is a specific combination of BCSDIR in gcm_run.j and the water restarts that will work. Unfortunately, gcm_setup is not your friend here; it will not tell you which folder to copy the restarts from given your choice of land boundary conditions.

Premature end of file scratch/fraci.data

This has to do with the choice of ocean during gcm_setup and making restart files. The Reynolds ocean ends some time in 2022, so you need to have picked a cubed sphere ocean boundary condition. Again, gcm_setup is not your friend here because in most cases the Reynolds ocean is the default choice. So if you have clicked through the default choices, you are toast. Set up two experiments, one with Reynols ocean and another with CS ocean, and check the differences in gcm_run.j and linkbcs. Try to make those same modifications in your actual experiment.