-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of atm_compute_vert_imp_coefs via OpenACC #1241
Enable GPU execution of atm_compute_vert_imp_coefs via OpenACC #1241
Conversation
@abishekg7 Can you clean up the commits and add a description to this PR? |
bd7bbdd
to
da32982
Compare
Thanks. Done! |
@@ -1894,32 +1894,46 @@ subroutine atm_compute_vert_imp_coefs_work(nCells, moist_start, moist_end, dts, | |||
real (kind=RKIND) :: dtseps, c2, qtotal, rcv | |||
real (kind=RKIND), dimension( nVertLevels ) :: b_tri, c_tri | |||
|
|||
MPAS_ACC_TIMER_START('atm_compute_vert_imp_coefs_work [ACC_data_xfer]') | |||
!$acc enter data copyin(rdzw, zz, fzm, fzp, rdzu, cqw, p, t, qtot, rb, & | |||
!$acc rtb, rt, pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the fields here are invariant (across timesteps) and should be handled in the mpas_atm_dynamics_{init,finalize}
routines instead.
Generally these invariant fields are those contained in the "invariant stream" (L421-L524 in the atm Registry.xml) or those accessed through the "mesh" mpas_pool
(L1222-L1516 in the atm Registry.xml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. thanks for the catch!
|
||
! set coefficients | ||
dtseps = .5*dts*(1.+epssm) | ||
rcv = rgas/(cp-rgas) | ||
c2 = cp*rcv | ||
|
||
!$acc parallel | ||
!$acc loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to be explicit with loop
directives (e.g. loop gang worker
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think I'll pause on this commit until I also am able to profile the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. No performance penalties for this change, with the limited area case at least.
|
||
!$acc exit data copyout(cofrz, cofwr, cofwz, coftz, cofwt, a_tri, b_tri, & | ||
!$acc c_tri, alpha_tri, gamma_tri) | ||
!$acc exit data delete(rdzw, zz, fzm, fzp, rdzu, cqw, p, t, qtot, rb, rtb, rt, pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides handling fields like fzm
in mpas_atm_dynamics_finalize
instead, these directives should be surrounded by MPAS_ACC_TIMER_{START,STOP}
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
! MGD bad to have all threads setting this variable? | ||
do k=1,nVertLevels | ||
cofrz(k) = dtseps*rdzw(k) | ||
end do | ||
|
||
!$acc end parallel | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line contains extra spaces that should be removed. Might be good to double check for any trailing whitespace in the lines you're adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR may need a little more work. I'm getting differences in 2 fields in the history file when comparing the regional testcase. Restart files are reported as matching.
Output from compare_netcdf.py:
Comparing files: f1 - f2
f1=../2024Dec31_MPASAPR1223_OpenACC/baseline_acc_1735844488/history.2019-09-01_00.06.00.nc
f2=test_att1735695005/history.2019-09-01_00.06.00.nc
Variable Min Max
=========================================
initial_time is not a numeric field and will not be compared
xtime is not a numeric field and will not be compared
depv_dt_fric -32.448421 33.315609
depv_dt_fric_pv -32.448421 30.517973
mminlu is not a numeric field and will not be compared
Some suggested edits as well.
76a5681
to
63b1e58
Compare
@abishekg7 The code changes themselves look good to me now. I'd only request that the commit history be cleaned up. If you want to keep the handling of the time-invariant fields in a separate commit, that seems fine, but just a single commit for this PR seems sufficient. |
eeecb13
to
d2b6428
Compare
Rebased and squashed to a single commit, and results are still bit identical. |
@abishekg7, I think everything is looking good except this sentence in the commit message
This seems like an incomplete thought, like there should be more after "...sent)". You could drop the "a" or replace the sentence with something like:
With something about "to enforce data management" potentially being added too. Aside: As I understand OpenACC terminology, directives refer to the part of the statement right after |
Thanks @gdicker1! I've been using clauses, directives and constructs interchangeably and it's good to know. Will fix this. |
This commit enables the GPU execution of the atm_compute_vert_imp_coefs subroutine using OpenACC directives for the data movements and loops. A new timer, 'atm_compute_vert_imp_coefs [ACC_data_xfer]', has been added for host- GPU data transfers in this subroutine. This port follows these considerations: - Separating the computation of cofrz into a separate parallel clause for correctness. - Variables b_tri and c_tri are declared private in lieu of extending the storage to two dimensions, like a_tri. - The data transfers for the invariant fields have been moved to mpas_atm_dynamics_init and finalize. - Explicitly adding gang, worker and vector level parallelism to parallel constructs in atm_compute_vert_imp_coefs_work. - The parallel constructs use default(present) clauses to avoid implicit data movements by the compiler.
d2b6428
to
ba661fc
Compare
Fixed the default(present) part of the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now!
(And I like what you added much more than my suggestion.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Initial OpenACC port of atm_compute_vert_imp_coefs.
cofrz
into a separate parallel clause is required for correctness.b_tri
andc_tri
are declared private in lieu of extending the storage to two dimensions, likea_tri
.alpha_tri
andgamma_tri
needs to be sequential