-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing divide-by-zero runtime error when compiled with oneAPI #33
base: master
Are you sure you want to change the base?
Conversation
Fixes #34 |
I've tested this change with the ACCESS-OM2 However, I'd like to understand how this division by zero comes about at all? I've output the >>> import xarray as xr
>>>
>>> ds = xr.open_dataset(
... "/g/data/tm70/ds0092/model/config/om2_1deg_jra55_ryf/archive/output000/ocean/ocean_grid_check.nc"
... )
>>> kmt = ds["kmt"]
>>> ht = ds["ht"]
>>>
>>> # Mask unprocessed blocks
>>> kmt = kmt.where(ht < 1e36)
>>>
>>> ht.where(kmt > 0).min().item()
11.805749893188477 |
@dougiesquire and I just had a zoom meeting and we are both confused about what might be going on. To summarise:
The confusion is because for a divide-by-zero to occur, ! find shallowest water and provide caveat for overly shallow regions
imin= 0 ; jmin=0 ; kmin=0 ; min_depth = 1.e10
do j=jsc,jec
do i=isc,iec
if(Grid%kmt(i,j) > 0 .and. Grid%ht(i,j) < min_depth) then
imin = i
jmin = j
kmin = Grid%kmt(i,j)
min_depth = Grid%ht(i,j)
endif
enddo
enddo
fudge = 1 + 1.e-12*mpp_pe() ! to distinguish processors when min_depth is independent of processor
min_depth = min_depth*fudge
min_depth0 = min_depth
call mpp_min(min_depth)
if(min_depth0 == min_depth) then
write(writeunit,'(/a,f12.5)')' The shallowest wet ocean model grid cell has depth (meters) ', min_depth
write(writeunit,'(a,i3,a,i3,a,i3,a)')' and this occurs at (i,j,k) = (',&
imin+Domain%ioff,',',jmin+Domain%joff ,',',kmin,')'
write(writeunit,'(a,f10.4,a,f10.4,a,f12.5,a/)')' which has (long,lat,depth) = (' &
,Grid%xt(imin,jmin),',',Grid%yt(imin,jmin), ',', Grid%ht(imin,jmin),')'
if(min_depth < 50.0) then
write(writeunit,'(a)')' Beware that shallow regions (e.g., those shallower than 50m) may be subject'
write(writeunit,'(a)')' to numerical problems if strong surface forcing is not mixed vertically.'
write(writeunit,'(a)')' Such problems may occur especially in shallow regions with kmt==2.'
write(writeunit,'(a)')' Current speeds and/or tracer deviations may become large due to the deposition'
write(writeunit,'(a)')' of wind and/or buoyancy over just a small upper ocean region. Such problems'
write(writeunit,'(a)')' can be resolved by adding sufficient vertical mixing in these regions.'
write(writeunit,'(a)')' Such happens in Nature due to tides and breaking surface waves.'
endif
endif
|
Okay, this is pretty odd. As we discussed, I added some print statements to the offending code to explicitly check the value of do i=isd,ied
if(Grid%kmt(i,j) > 0) then
+ if(Grid%ht(i,j) < 0.1) then
+ write(writeunit,'(a,f10.4,a,f10.4,a)')' Grid%ht < 0.1 at (lon,lat)=(',Grid%xt(i,j),',',Grid%yt(i,j),')'
+ write(writeunit,'(a,i4,a,f10.4/)')' Grid%kmt = ',Grid%kmt(i,j),', Grid%ht = ',Grid%ht(i,j)
+ endif
Grid%htr(i,j) = 1.0/Grid%ht(i,j)
endif
enddo Nothing is printed by these print statements when the model is run, but the stacktrace for the divide by zero error changes when they are included: forrtl: error (73): floating divide by zero
Image PC Routine Line Source
libpthread-2.28.s 0000154D9CD28D10 Unknown Unknown Unknown
fms_ACCESS-OM.x 000000000072B1A4 thickness_initial 1362 ocean_thickness.F90
fms_ACCESS-OM.x 00000000007114A1 ocean_thickness_i 630 ocean_thickness.F90
fms_ACCESS-OM.x 000000000045DDB2 ocean_model_init 1269 ocean_model.F90
fms_ACCESS-OM.x 00000000004317FC main 366 ocean_solo.F90
fms_ACCESS-OM.x 0000000000412A0D Unknown Unknown Unknown
libc-2.28.so 0000154D9C97A7E5 __libc_start_main Unknown Unknown
fms_ACCESS-OM.x 000000000041292E Unknown Unknown Unknown The new stacktrace refers to another section of code where division by zero could occur. Again, I add print statements as follows: do i=isd,ied
if(Grid%kmt(i,j) > 1) then
+ if(Thickness%pbot0(i,j) < 0.1) then
+ write(unit,'(a,f10.4,a,f10.4,a)')' Thickness%pbot0 < 0.1 at (lon,lat)=(',Grid%xt(i,j),',',Grid%yt(i,j),')'
+ write(unit,'(a,i4,a,f10.4/)')' Grid%kmt = ',Grid%kmt(i,j),', Thickness%pbot0 = ',Thickness%pbot0(i,j)
+ endif
Thickness%pbot0r(i,j) = 1.0/Thickness%pbot0(i,j)
endif
enddo Again, nothing is printed by these print statements but the stacktrace changes yet again to refer to another section where division by zero could occur: forrtl: error (73): floating divide by zero
Image PC Routine Line Source
libpthread-2.28.s 00001465706B6D10 Unknown Unknown Unknown
fms_ACCESS-OM.x 00000000007358EF ocean_thickness_i 1527 ocean_thickness.F90
fms_ACCESS-OM.x 000000000045E1C4 ocean_model_init 1301 ocean_model.F90
fms_ACCESS-OM.x 0000000000422EDC main 366 ocean_solo.F90
fms_ACCESS-OM.x 0000000000412A0D Unknown Unknown Unknown
libc-2.28.so 00001465703087E5 __libc_start_main Unknown Unknown
fms_ACCESS-OM.x 000000000041292E Unknown Unknown Unknown Again, I added print statements to the offending code as follows: do i=isd,ied
if(Grid%kmt(i,j) > 1) then
+ if(Thickness%pbot0(i,j) < 0.1) then
+ write(unit,'(a,f10.4,a,f10.4,a)')' Thickness%pbot0 < 0.1 at (lon,lat)=(',Grid%xt(i,j),',',Grid%yt(i,j),')'
+ write(unit,'(a,i4,a,f10.4/)')' Grid%kmt = ',Grid%kmt(i,j),', Thickness%pbot0 = ',Thickness%pbot0(i,j)
+ endif
Thickness%pbot0r(i,j) = 1.0/Thickness%pbot0(i,j)
endif
endd And we now get a forrtl: error (65): floating invalid
Image PC Routine Line Source
libpthread-2.28.s 0000149F10860D10 Unknown Unknown Unknown
fms_ACCESS-OM.x 0000000000C4B4DE ocean_vert_tidal_ 817 ocean_vert_tidal.F90
fms_ACCESS-OM.x 000000000047772B ocean_vert_mix_in 889 ocean_vert_mix.F90
fms_ACCESS-OM.x 000000000045E3FB ocean_model_init 1322 ocean_model.F90
fms_ACCESS-OM.x 0000000000422EDC main 366 ocean_solo.F90
fms_ACCESS-OM.x 0000000000412A0D Unknown Unknown Unknown
libc-2.28.so 0000149F104B27E5 __libc_start_main Unknown Unknown
fms_ACCESS-OM.x 000000000041292E Unknown Unknown Unknown Are we dealing with a compiler bug? |
When (unmodified) MOM5 is compiled with
The offending line also divides by
I'm starting to suspect we may have some bad value in |
@dougiesquire NaN's? |
Yeah exactly |
When compiled with oneAPI, @dougiesquire found that there was a divide-by-zero - as noted in this comment
Numerically this is the appropriate fix, but someone with knowledge of the code should confirm/clarify why the other check was there in the first place (as in, why
%kmt > 0
check was being used instead of%ht > 0
when calculating the reciprocal of%ht
)