-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of atm_compute_moist_coefficients via OpenACC #1238
Enable GPU execution of atm_compute_moist_coefficients via OpenACC #1238
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need some additional work. When I compared baseline and test runs of the regional testcase, I get differences in two fields in the history file. Restart files are reported as matching.
Output from compare_netcdf.py:
Comparing files: f1 - f2
f1=../2024Dec31_MPASAPR1223_OpenACC/baseline_acc_1735844488/history.2019-09-01_00.06.00.nc
f2=test_att1735694999/history.2019-09-01_00.06.00.nc
Variable Min Max
=========================================
initial_time is not a numeric field and will not be compared
xtime is not a numeric field and will not be compared
depv_dt_fric -32.448421 33.315609
depv_dt_fric_pv -32.448421 30.517973
mminlu is not a numeric field and will not be compared
Also a couple of suggested edits.
Modified the two parallel constructs to explicitly specify the level of parallelism. This also seems to lead to a performance increase for the limited area test by about 10% in the parallel clause on L1754 and about 51% on L1767. Also verified that results are still bit reproducible. Added timers around the exit data clause and made the other suggested formatting changes. The differences in the diagnostic fields in the history files are expected per our previous discussion. |
@abishekg7 The code changes themselves look good to me now. I'd only request that the commit history be cleaned up, and perhaps just a single commit would suffice. |
This commit enables the GPU execution of the atm_compute_moist_coefficients subroutine using OpenACC directives for the data movements and loops. A new timer, 'atm_compute_moist_coefficients [ACC_data_xfer]', has been added for data transfers in the atm_compute_moist_coefficients subroutine. Explicitly adding gang, worker and vector level parallelism indicators to parallel constructs in this subroutine results in noticeable performance improvements. This commit uses the default(present) in parallel clauses, which requires dereferencing the pointers to scalars used in the loops.
236e9a2
to
4add487
Compare
Rebased this PR and ran a quick check for bit reproducibility again. thanks for the review! |
Nice work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
This PR enables the GPU execution of atm_compute_moist_coefficients subroutine.
Tested with a real and idealized test cases.