Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU execution of atm_compute_vert_imp_coefs via OpenACC #1241

Merged

Conversation

abishekg7
Copy link
Collaborator

@abishekg7 abishekg7 commented Oct 23, 2024

Initial OpenACC port of atm_compute_vert_imp_coefs.

  • Separating the computation of cofrz into a separate parallel clause is required for correctness.
  • Variables b_tri and c_tri are declared private in lieu of extending the storage to two dimensions, like a_tri.
  • Back substitution/Computation of alpha_tri and gamma_tri needs to be sequential

@mgduda mgduda requested review from mgduda and gdicker1 November 1, 2024 20:41
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Nov 1, 2024
@mgduda
Copy link
Contributor

mgduda commented Nov 1, 2024

@abishekg7 Can you clean up the commits and add a description to this PR?

@abishekg7 abishekg7 force-pushed the atmosphere/port_atm_compute_vert_imp_coefs branch from bd7bbdd to da32982 Compare November 6, 2024 22:51
@abishekg7
Copy link
Collaborator Author

@abishekg7 Can you clean up the commits and add a description to this PR?

Thanks. Done!

@@ -1894,32 +1894,46 @@ subroutine atm_compute_vert_imp_coefs_work(nCells, moist_start, moist_end, dts,
real (kind=RKIND) :: dtseps, c2, qtotal, rcv
real (kind=RKIND), dimension( nVertLevels ) :: b_tri, c_tri

MPAS_ACC_TIMER_START('atm_compute_vert_imp_coefs_work [ACC_data_xfer]')
!$acc enter data copyin(rdzw, zz, fzm, fzp, rdzu, cqw, p, t, qtot, rb, &
!$acc rtb, rt, pb)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the fields here are invariant (across timesteps) and should be handled in the mpas_atm_dynamics_{init,finalize} routines instead.

Generally these invariant fields are those contained in the "invariant stream" (L421-L524 in the atm Registry.xml) or those accessed through the "mesh" mpas_pool (L1222-L1516 in the atm Registry.xml)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. thanks for the catch!


! set coefficients
dtseps = .5*dts*(1.+epssm)
rcv = rgas/(cp-rgas)
c2 = cp*rcv

!$acc parallel
!$acc loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to be explicit with loop directives (e.g. loop gang worker here).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think I'll pause on this commit until I also am able to profile the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. No performance penalties for this change, with the limited area case at least.


!$acc exit data copyout(cofrz, cofwr, cofwz, coftz, cofwt, a_tri, b_tri, &
!$acc c_tri, alpha_tri, gamma_tri)
!$acc exit data delete(rdzw, zz, fzm, fzp, rdzu, cqw, p, t, qtot, rb, rtb, rt, pb)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides handling fields like fzm in mpas_atm_dynamics_finalize instead, these directives should be surrounded by MPAS_ACC_TIMER_{START,STOP} calls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

! MGD bad to have all threads setting this variable?
do k=1,nVertLevels
cofrz(k) = dtseps*rdzw(k)
end do

!$acc end parallel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line contains extra spaces that should be removed. Might be good to double check for any trailing whitespace in the lines you're adding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. thanks!

Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR may need a little more work. I'm getting differences in 2 fields in the history file when comparing the regional testcase. Restart files are reported as matching.

Output from compare_netcdf.py:

Comparing files: f1 - f2                                                                                                                             
        f1=../2024Dec31_MPASAPR1223_OpenACC/baseline_acc_1735844488/history.2019-09-01_00.06.00.nc                                                   
        f2=test_att1735695005/history.2019-09-01_00.06.00.nc                                                                                         
                                                                                                                                                     
            Variable  Min       Max                                                                                                                  
=========================================
initial_time is not a numeric field and will not be compared
xtime is not a numeric field and will not be compared
        depv_dt_fric -32.448421  33.315609
     depv_dt_fric_pv -32.448421  30.517973
mminlu is not a numeric field and will not be compared

Some suggested edits as well.

@abishekg7 abishekg7 force-pushed the atmosphere/port_atm_compute_vert_imp_coefs branch from 76a5681 to 63b1e58 Compare January 15, 2025 16:06
@mgduda mgduda self-requested a review January 16, 2025 01:16
@mgduda
Copy link
Contributor

mgduda commented Jan 16, 2025

@abishekg7 The code changes themselves look good to me now. I'd only request that the commit history be cleaned up. If you want to keep the handling of the time-invariant fields in a separate commit, that seems fine, but just a single commit for this PR seems sufficient.

@abishekg7 abishekg7 force-pushed the atmosphere/port_atm_compute_vert_imp_coefs branch from eeecb13 to d2b6428 Compare January 16, 2025 16:27
@abishekg7
Copy link
Collaborator Author

Rebased and squashed to a single commit, and results are still bit identical.

@gdicker1
Copy link
Collaborator

gdicker1 commented Jan 16, 2025

@abishekg7, I think everything is looking good except this sentence in the commit message

  • The parallel clauses now specify a default(present)

This seems like an incomplete thought, like there should be more after "...sent)". You could drop the "a" or replace the sentence with something like:

  • The parallel constructs use default(present) clauses.

With something about "to enforce data management" potentially being added too.

Aside: As I understand OpenACC terminology, directives refer to the part of the statement right after acc. E.g. "parallel", "kernels", "host_data", "data", and "exit data" . If the directive applies to a region of code, the directive and the region together are a construct. Clauses are the other parts of a statement that come after the directive (like "collapse(n)" or "async").

@abishekg7
Copy link
Collaborator Author

@abishekg7, I think everything is looking good except this sentence in the commit message

  • The parallel clauses now specify a default(present)

This seems like an incomplete thought, like there should be more after "...sent)". You could drop the "a" or replace the sentence with something like:

  • The parallel constructs use default(present) clauses.

With something about "to enforce data management" potentially being added too.

Aside: As I understand OpenACC terminology, directives refer to the part of the statement right after acc. E.g. "parallel", "kernels", "host_data", "data", and "exit data" . If the directive applies to a region of code, the directive and the region together are a construct. Clauses are the other parts of a statement that come after the directive (like "collapse(n)" or "async").

Thanks @gdicker1! I've been using clauses, directives and constructs interchangeably and it's good to know. Will fix this.

This commit enables the GPU execution of the atm_compute_vert_imp_coefs
subroutine using OpenACC directives for the data movements and loops. A new
timer, 'atm_compute_vert_imp_coefs [ACC_data_xfer]', has been added for host-
GPU data transfers in this subroutine. This port follows these considerations:

- Separating the computation of cofrz into a separate parallel clause
  for correctness.
- Variables b_tri and c_tri are declared private in lieu of extending
  the storage to two dimensions, like a_tri.
- The data transfers for the invariant fields have been moved to
  mpas_atm_dynamics_init and finalize.
- Explicitly adding gang, worker and vector level parallelism to parallel
  constructs in atm_compute_vert_imp_coefs_work.
- The parallel constructs use default(present) clauses to avoid implicit
  data movements by the compiler.
@abishekg7 abishekg7 force-pushed the atmosphere/port_atm_compute_vert_imp_coefs branch from d2b6428 to ba661fc Compare January 16, 2025 17:32
@abishekg7
Copy link
Collaborator Author

Fixed the default(present) part of the commit message.

Copy link
Collaborator

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!

(And I like what you added much more than my suggestion.)

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@mgduda mgduda merged commit 589bbe6 into MPAS-Dev:develop Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants