-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage during GWDO stats estimation #1235
base: develop
Are you sure you want to change the base?
Reduce memory usage during GWDO stats estimation #1235
Conversation
Tested on Eris (GNU) with 24km grid, and Derecho (GNU) with 15km and 24km. Derecho NVHPC build and run is successful for 15km grid. |
Tested on 5km grid as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some testing with a 120-km global mesh and all seems good. I'll do further testing with a variable-resolution regional mesh. Meanwhile, I've left a few comments that mostly concern style.
Thanks for the additional review @mgduda! I have addressed with the last two commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than an update to a routine name in a comment block (as noted), I think we're in good shape. After addressing the routine name issue, feel free to rework the commit history for this PR so that it has one or a few commits with a clear and distinct purpose.
Additionally -- though it can be put off to a future PR -- it could be worth looking for simple ways of optimizing the get_box
and/or get_tile_from_box_point
routines.
…on of GWDO statistics This commit introduces code changes in order to improve the memory footprint of the computation of Gravity Wave Drag Orography statistics in the init_atmosphere. The previous approach involved each MPI rank reading in the entirety of the topography and land use tiles, and this may prevent fully-subscribing to all cores in a node. The new approach only reads in one tile at a time, when it encounters a pixel whose data is not already available in a linked list. The new algorithm is able to fully subscribe to all ranks in a node, leading to better parallel performance. The get_box subroutine is modified to call get_tile_from_box_point to check if the current pixel in the box is present in the linked list, and if not, it appends this tile (after reading both topo and land use data) to the head of the list. This commit also changes box, box_landuse, dxm, nx and ny to be local variables, instead of module variables. This provides better readability, along with advantages of thread safety. This commit also removes the extraneous conversion of the cell latitude from radians to degrees and back to radians, prior to the estimation of zonal box size. The result is a numerically more correct code, but it results in marginal differences with the previous approach
…last matching tile This commit introduces an optimization for the lookup of tiles in subroutine get_tile_from_box_point by passing the most recent successful tile lookup to the next iteration of the search. These changes substantially improves the single-core performance of the compute_gwd_fields subroutine, and also improve the parallel performance of this computation.
650a60c
to
7ad8bf6
Compare
A copy of #1228 with branch renamed. This PR attempts to improve the memory footprint during the computation of Gravity Wave Drag Orography statistics in the pre-processing step.
The previous approach relied on each MPI rank reading all of the topography and land use tiles, and hence would run out of memory before we could fully subscribe to all cores in a node. In the new approach, we only read in one tile at a time and when we encounter a pixel whose data is not already available in a linked list. The
get_box
subroutine is modified to callget_tile_from_box_point
to check if the current pixel in the box is present in the linked list, and if not, it appends this tile (after reading both topo and land use data) to the head of the list.This PR also changes
box, box_landuse, dxm, nx and ny
to be local variables, instead of module variables. This provides a little better readability, along with advantages of thread safety, etc.