Skip to content

Commit

Permalink
Merge branch 'ykim/frontier/srunargs' (PR #7038)
Browse files Browse the repository at this point in the history
* Change lmod paths from /usr/share to /opt/cray/pe because the /usr/share software is not maintained and is not available on internal OLCF test computers.
* Update craygnu to latest version of cpe available on Frontier, cpe/24.11.
* Add explicit versions to other modules in craygnu. I used to think this wasn't needed, since module load cpe/24.11 should set all the appropriate defaults. I recently discovered that this setting of defaults only works if the module load cpe/24.11 is a separate command before the other module loads. Cime uses a single module load with a list of modules, which does not update the defaults. I selected the explicit versions based on the cpe/24.11 defaults, including the default for rocm, rocm/6.2.4.
* Remove the libfabric module from craygnu. On Feb 18, the default libfabric changed from libfabric/1.20.1 to libfabric/1.22.0. Only the default version on a given computer is officially supported by HPE. I removed the libfabric module from craygnu so that it will stay with the default when it changes.
  • Loading branch information
grnydawn committed Feb 21, 2025
2 parents 1e5b116 + 19e5c08 commit 21248da
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion cime_config/machines/config_machines.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1051,8 +1051,10 @@
<executable>srun</executable>
<arguments>
<arg name="num_tasks"> -l -K -n {{ total_tasks }} -N {{ num_nodes }} </arg>
<arg name="binding">--gpus-per-node=8 --gpu-bind=closest</arg>
<arg name="thread_count">-c $ENV{OMP_NUM_THREADS}</arg>
<arg name="gpus_per_node">$ENV{GPUS_PER_NODE}</arg>
<arg name="ntasks_per_gpu">$ENV{NTASKS_PER_GPU}</arg>
<arg name="gpu_bind">$ENV{GPU_BIND_ARGS}</arg>
</arguments>
</mpirun>

Expand Down Expand Up @@ -1138,10 +1140,16 @@
<env name="MPICH_OFI_CXI_COUNTER_REPORT">2</env>
<env name="LD_LIBRARY_PATH">$ENV{CRAY_LD_LIBRARY_PATH}:$ENV{LD_LIBRARY_PATH}</env>
<env name="SKIP_BLAS">True</env> <!-- find_package(blas) doesn't work well with Cray LibSci-->
<env name="GPUS_PER_NODE"> </env>
<env name="NTASKS_PER_GPU"> </env>
<env name="GPU_BIND_ARGS"> </env>
</environment_variables>
<environment_variables compiler=".*hipcc">
<env name="MPICH_GPU_SUPPORT_ENABLED">1</env>
<env name="MPICH_CXX">$SHELL{which hipcc}</env>
<env name="GPUS_PER_NODE">--gpus-per-node=8</env>
<env name="NTASKS_PER_GPU">--ntasks-per-gpu=$SHELL{echo "`./xmlquery --value MAX_MPITASKS_PER_NODE`/8"|bc}</env>
<env name="GPU_BIND_ARGS">--gpu-bind=closest</env>
</environment_variables>

<environment_variables BUILD_THREADED="TRUE">
Expand Down

0 comments on commit 21248da

Please sign in to comment.