-
Notifications
You must be signed in to change notification settings - Fork 15
Building GEOS with latest modifications to GOCART for the Carbon group
Much of this is already documented here, only the essential steps are replicated here for convenience.
First, you need to have the correct modules to check out the code. If you are expert user, make sure you have git
and mepo
in your path. If you're not, execute the following to get them (as well as anything else you might need).
module purge
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module load GEOSenv
Then, decide where you want to set up the model code. While you can check it out in your home directory, we recommend checking it out on a scratch space because it can get pretty big, especially when you are testing multiple different model versions. On NCCS, you typically want to use /discover/nobackup/${USER}
. Somewhere in that folder, check out the model with
git clone -b v11.5.2 [email protected]:GEOS-ESM/GEOSgcm.git GEOSgcm-v11.5.2
Note that 11.5.2
simply happens to be the latest released tag at the time this is being written. There is nothing sacred about that tag. If you want a later tag, you can find all release versions here.
The model consists of code in several sub-repositories, by default none of which are checked out. There is a file called components.yaml
in the source tree you just checked out, which contains the tags for each repository that will be checked out. Save a copy of this file somewhere, say as components.yaml.orig
. Then add the following block to check out RRG
:
RRG:
local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSphysics_GridComp/@GEOSchem_GridComp/@RRG
remote: ../RRG.git
branch: main
develop: develop
In addition, both GEOSchem_GridComp
and GOCART
will need to be changed to branches that contain the latest GOCART code. In components.yaml
, change the tag
line following GOCART
to branch: feature/sbasu1/gocart+11.5.2
, and the tag
line following GEOSchem_GridComp
to branch: feature/sbasu1/gocart+11.5.2
. Note that these are different repositories, although the branch names are identical.
Now issue mepo clone
at the command line to check out all the repositories at the branches/tags in components.yaml
.
It is best to build the code on a compute node of the same architecture as the ones you will be running the model on. For this example we will be building and running on AMD Milan nodes, so get a terminal on such a compute node with
salloc --nodes=1 --constraint=mil -t 60 -A s1460 --qos debug
This gets you a terminal on a Milan node under the debug
queue, which is pretty fast but has a wall clock limit of 1 hour. You could, alternatively, issue this command first thing in the morning with -t 480
and get a node for 8 hours. You will need to wait longer to get a node, but once you do, you're set for a day's worth of building and debugging.
Once you get on a compute node, go to the folder where you checked out the source tree, and just to be safe create a clean environment as follows:
module purge
cd @env
source g5_modules.sh
cd ..
Now you're ready to build. Since you're already on a compute node, no need to submit a parallel build job. Instead, issue the following commands in order:
mkdir build
cd build
cmake .. -DBASEDIR=$BASEDIR/Linux -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install
make -j install
This builds the model into ../install
, specifically the GEOS GCM executable is ../install/bin/GEOSgcm.x
.
Important: When you run gcm_setup
to set up a new run, this executable is copied over to the run directory. As a result, if you want to fix something in code and recompile, the changes will not be seen in your run unless you copy over the executable again. Therefore, I often symlink install/bin/GEOSgcm.x
from my run directory.
Go into install/bin
and execute ./gcm_setup
.
-
Experiment ID
is any name you want to give the run. It's a good idea to include the model version and something about which tracers you are running in a short name. Mostly something that you will remember. If you call it (say)Apple
it's pretty much guaranteed that you won't remember what it is for two years down the line. I'm calling mineGCM-11.5.2-methane-c180
. -
Experiment Description
is a short description to help you remember. -
CLONE
is the ability of a model to copy over someone else's run folder. This is a very useful ability, but for now let's chooseNO
. -
Atmospheric Horizontal Resolution
depends on what you want to run. I'm choosingc180
. It's perfectly fine to choosec90
for model development. - Default
Vertical Resolution
of 72 layers is fine - Default
Microphysics
ofBACM_1M
is fine - Default
TRUE
forHydrostatic Atmosphere
is fine - Use
IOSERVER
if you're runningc180
or higher - Default processor type of
mil
is fine - Default
NO
toCOUPLED Ocean/Sea-Ice Model
is fine - Choose
CS
(cubed sphere) forData_Ocean Horizontal Resolution
- Default choice
Icarus-NLv3
for land surface boundary conditions is fine - Default choice
Catchment
for land surface model is fine - Accept default choice to run GOCART with
Actual
aerosols - Choose to use
OPS
emission files for GOCART, because theAMIP
emission files do not exist for recent years - For
c180
, aHEARTBEAT_DT
of450
is fine - Don't worry about the
HISTORY template
, you are going to change the history file anyway - The
HOME Directory
is where the run folder will be created. Just make sure it's created somewhere inside/discover/nobackup/projects/gmao/geos_carb/${USER}
- In theory
EXPERIMENT Directory
can be different fromHOME Directory
, but no one has ever tried it. Either set it to be the same, or try at your own risk and don't expect any sympathy if you break something. - The
Build directory
should already be correct - Our
GROUP ID
iss1460
Every so often gcm_setup
will fail with errors like
/tmp/tmp.sVmzWQGKy5: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.VXxsGkzxBY: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.kAg9MZYON8: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.Xced0YAEi2: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.igPE8a3dYw: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.OL9WF9FmLj: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.73LABTUPJy: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.rdwOgZdlbP: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.Qak33WKR9E: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.vDLxsZWrwJ: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
cat: /discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl.tmp: No such file or directory
For some unknown reason, /tmp
on discover acts up with denied permissions. Probably because it's mounted with noexec
. To solve, do
export TMPDIR=/discover/nobackup/$USER/tmp
mkdir -p $TMPDIR
before executing gcm_setup
.
This is a dark art. Remembering Robert the Bruce before embarking on this endeavor would be well advised.
GEOS restart files are called *_rst
, even though they're really netcdf files. Ours not to reason why, ours but to do and die. You will see some *_import_rst
and some *_internal_rst
. Ignore the first kind, you will only need to supply the second kind for a new run. There are two types of *_internal_rst
restart files, upper air (3D) restarts and surface (2D) restarts. Upper air restarts are defined on the cube, contains variables with shape levels x N x 6N
, and are fairly easily created by the provided scripts for creating/remapping restarts (more below). There are very few ways in which these can "go wrong". Surface restarts can also be created by the provided remapping scripts. However, these will very likely make you weep. Instead of being on grids, surface restarts are provided as a list of tiles (my theory is that whoever made that decision was trying to save disk space and reinvented the wheel instead of relying on compression algorithms). Every single land model has a different ordering of these tiles, and understanding what your land model is requires a fair amount of expert knowledge. Worse, the choice of a land model makes pretty much zero difference in a replay run, yet your model will crash unless you do this correctly. In momemnts of frustration, remember Robert the Bruce.
The script to create restarts is called install/bin/remap_restarts.py
. Do not run this on a compute node because it requires access to some filesystems that are not mounted on compute nodes. On a front-end node, run it as follows:
module purge
source @env/g5_modules.sh
install/bin/remap_restarts.py
This will present you with a series of questions, answer as follows.
- Remap archived MERRA-2 restarts?
Yes
- Enter restart date/time: Enter YYYYMMDDHH, where HH is one of 03, 09, 15 or 21 for MERRA2
- Enter output directory for new restarts: Make sure this is a unique folder which is not your run folder, you can later copy them over
- Remap to a stretched cubed-sphere grid?
No
- Enter atmospheric grid for new restarts: Enter the same atmospheric resolution you entered for
gcm_setup
- Select ocean model for new restarts:
data
- Select data ocean grid/resolution for new restarts:
CS
- Enter number of atmospheric levels for new restarts: Choose what you chose for
gcm_setup
- Select boundary conditions (BCs) version for new restarts: This depends on what you chose for the land boundary condition in
gcm_setup
. If you choseIcarus-NLv3
there, chooseNL3
here. - Land BCs for input restarts: You will be presented a folder choice, accept it
- Select BCs base directory for new restarts: Select what you are given
- Land BCs for output restarts: Select what you are given
- Remap upper air restarts?
Yes
- Remap agcm_import_rst (a.k.a. IAU) file needed for REPLAY runs?
No
- Remap surface restarts?
Yes
- Remap bkg files?
No
- Write lcv file?
No
- Enter value of WEMIN. No idea what this is, just choose what you are given.
- Enter value of zoom parameter for surface restarts [1-8]? No idea what this is, just choose what you are given.
- Enter experiment ID for new restarts: Fine to leave this blank.
- Add labels for BCs version and atm/ocean resolutions to restart file names?
No
- SLURM or PBS quality-of-service (qos)?
debug
- ('Select/enter SLURM or PBS account:\n',)
s1460
- ('Enter SLURM or PBS partition: (If desired; can leave blank.)\n',) Leave blank.
After entering all the questions, it will submit a job to the queue to regrid the restarts, and make you wait while it does, i.e., the sbatch
command won't exit. Don't close the terminal or quit at this point, hopefully the debug queue will be quick enough. Once the job is done, you need to copy over the *_rst.nc4
files from the ouput folder (above) to your run directory and remove the extension .nc4
.
The GEOS GCM is by default "free running", which means that it has no obligation to follow the real atmosphere. It is a dynamical model which will be driven by an initial condition, the Navier-Stokes equations, incoming solar radiation, and a few other boundary conditions. If you want it to have the winds that were actually observed, you will need to replay it to a meteorological reanalysis. The reanalysis knows about what happened in the past by virtue of weather data assimilation.
Enable replay in AGCM.rc
by uncommenting one of the REPLAY_MODE
keys. The most typical replay configuration you will use is "Regular" replay to the MERRA2 reanalysis. You have the choice of replaying to either 6-hourly snapshots at 3z, 9z, 15z and 21z, or 3 hourly averages spanning 0-3z, 3-6z, etc. To replay to 3-hourly averages, which is recommended, use the following settings in AGCM.rc
:
ASSIMILATION_CYCLE: 10800
REPLAY_MODE: Regular
REPLAY_ANA_EXPID: MERRA-2
REPLAY_FILE: /discover/nobackup/projects/gmao/merra2/data/products/MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
REPLAY_FILE_FREQUENCY: 10800
REPLAY_FILE_REFERENCE_TIME: 013000
The repository version of gcm_run.j
as of October 16 2024 will not work with this. That is because that gcm_run.j
expects two keys, REPLAY_ANA_LOCATION
and REPLAY_FILE
. The above would correspond to the pair
REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/merra2/data/products
REPLAY_FILE: MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
When you run the model, gcm_run.j
assumes that the first path component of REPLAY_FILE
is a folder, and makes a symlink of that name inside scratch
pointing to REPLAY_ANA_LOCATION
, i.e., scratch/MERRA2_all
points to /discover/nobackup/projects/gmao/merra2/data/products
. So when GEOS runs, it is really reading scratch/MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
after substituting all the date and time tokens. However, this mechanism will clearly not work with the following pair
REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/merra2/data/products/MERRA2_all
REPLAY_FILE: Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
which, from the perspective of a normal human user used to filesystem logic, is equivalent to the key pair that works. Worse, GEOS doesn't actually need REPLAY_ANA_LOCATION
, it only reads the key REPLAY_FILE
, and is perfectly capable of handling long paths. Hence, in my gcm_run.j
I have removed the entire mechanism of creating the aforementioned symlink (search for the conditional block if( $REPLAY_MODE == 'Exact' | $REPLAY_MODE == 'Regular' ) then
and look at the lines commented within), and removed the key REPLAY_ANA_LOCATION
in AGCM.rc
.
Every so often, when you try to run a fresh model setup, you'll get an error such as
Error! Found 339967 tiles in openwater. Expect to find 359523 tiles.
Your restarts are probably for a different ocean.
This is probably because the water restarts you are using come from a run with a different choice of Land Surface Boundary Conditions
. Look in your gcm_run.j
, specifically setenv BCSDIR
. Set this to whatever is in the run you copied the water restarts from. There is a specific combination of BCSDIR
in gcm_run.j
and the water restarts that will work. Unfortunately, gcm_setup
is not your friend here; it will not tell you which folder to copy the restarts from given your choice of land boundary conditions.
This has to do with the choice of ocean during gcm_setup
and making restart files. The Reynolds ocean ends some time in 2022, so you need to have picked a cubed sphere ocean boundary condition. Again, gcm_setup
is not your friend here because in most cases the Reynolds ocean is the default choice. So if you have clicked through the default choices, you are toast. Set up two experiments, one with Reynols ocean and another with CS ocean, and check the differences in gcm_run.j
and linkbcs
. Try to make those same modifications in your actual experiment.