Report errors when apply `surface_flux` for 3D #3

huiyuxie · 2023-07-08T04:37:47Z

Kernel surface_flux_kernel!() worked for both 1D and 2D but failed for 3D.
1D version surface_flux_kernel!() function
https://github.com/huiyuxie/trixi_cuda/blob/a6e268a54203e725c67997f7f5a33bd82001fd60/cuda_dg_1d.jl#L173-L190
2D version surface_flux_kernel!() function
https://github.com/huiyuxie/trixi_cuda/blob/a6e268a54203e725c67997f7f5a33bd82001fd60/cuda_dg_2d.jl#L196-L215
But for 3D equations, we provided two ways for comparing the benchmark results but both failed
https://github.com/huiyuxie/trixi_cuda/blob/0ed812ce977b5ebc80f7a92c042864acb1625a46/cuda_dg_3d.jl#L205-L248
The errors are the same like

ERROR: InvalidIRError: compiling MethodInstance for surface_flux_kernel!(::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 5, 1}, ::CuDeviceVector{Int32, 1}, ::CompressibleEulerEquations3D{Float64}, ::FluxLaxFriedrichs{typeof(max_abs_speed_naive)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:219
Reason: unsupported dynamic function invocation (call to convert)
Stacktrace:
 [1] setindex!
   @ ~/.julia/packages/CUDA/pCcGc/src/device/array.jl:166
 [2] setindex!
   @ ~/.julia/packages/CUDA/pCcGc/src/device/array.jl:179
 [3] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:219
Reason: unsupported dynamic function invocation (call to (::Type{SA})(x...) where SA<:StaticArray @ StaticArrays ~/.julia/packages/StaticArrays/O6dgq/src/convert.jl:160)
Stacktrace:
 [1] flux
   @ ~/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:399
 [2] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:20
 [3] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [4] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to (::Type{SA})(x...) where SA<:StaticArray @ StaticArrays ~/.julia/packages/StaticArrays/O6dgq/src/convert.jl:160)
Stacktrace:
 [1] flux
   @ ~/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:399
 [2] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:21
 [3] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [4] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
 [1] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:24
 [2] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [3] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to *)
Stacktrace:
 [1] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:24
 [2] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [3] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
 [1] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [2] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/validation.jl:149
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:411 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:410 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:118
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:92 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:88
 [10] compile
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:79 [inlined]
 [11] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:125
 [12] #1032
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:120 [inlined]
 [13] LLVM.ThreadSafeContext(f::CUDA.var"#1032#1033"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ LLVM ~/.julia/packages/LLVM/5aiiG/src/executionengine/ts_module.jl:14
 [14] JuliaContext
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:35 [inlined]
 [15] compile
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:119 [inlined]
 [16] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:125
 [17] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:103
 [18] macro expansion
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:318 [inlined]
 [19] macro expansion
    @ ./lock.jl:267 [inlined]
 [20] cufunction(f::typeof(surface_flux_kernel!), tt::Type{Tuple{CuDeviceArray{Float32, 5, 1}, CuDeviceArray{Float32, 5, 1}, CuDeviceVector{Int32, 1}, CompressibleEulerEquations3D{Float64}, FluxLaxFriedrichs{typeof(max_abs_speed_naive)}}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:313
 [21] cufunction(f::typeof(surface_flux_kernel!), tt::Type{Tuple{CuDeviceArray{Float32, 5, 1}, CuDeviceArray{Float32, 5, 1}, CuDeviceVector{Int32, 1}, CompressibleEulerEquations3D{Float64}, FluxLaxFriedrichs{typeof(max_abs_speed_naive)}}})
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:310
 [22] top-level scope
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:104

Both kernels for 3D worked for test/linear_scalar_advection_3d.jl but both failed for test/compressible_euler_3d.jl. Having checked the flux computing process for 1D, 2D, and 3D compressible euler equations but found nothing extremely different. So currently the issue cannot be solved.

The text was updated successfully, but these errors were encountered:

'

ranocha · 2023-07-10T03:56:44Z

Which versions of

Julia
Trixi.jl
StaticArrays.jl

are you using?

huiyuxie · 2023-07-10T20:24:48Z

Thanks for helping me solve the issue!

Julia v1.9.0
Trixi v0.5.28
StaticArrays v1.5.26

ranocha · 2023-07-11T08:45:04Z

That's an ugly error. Did you try to follow the

Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

given in the stacktrace? It may be that we're running into some compiler heuristics that work fine for small tuples but not as well for slightly larger tuples - which is why it manifests only for 3D.

ranocha · 2023-07-11T11:22:34Z

Aha, I think I understood something. I am using Julia v1.9.2 but that shouldn't matter. First, I cloned this repo and ran

(trixi_cuda) pkg> instantiate

Then, I executed

julia> include("header.jl")

julia> include("cuda_dg_3d.jl")

and ran the code

# CUDA kernel for calculating surface fluxes 
function surface_flux_kernel1!(surface_flux_arr, interfaces_u, orientations,
    equations::AbstractEquations{3}, surface_flux::Any)
    j2 = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    j3 = (blockIdx().y - 1) * blockDim().y + threadIdx().y
    k = (blockIdx().z - 1) * blockDim().z + threadIdx().z

    if (j2 <= size(surface_flux_arr, 3) && j3 <= size(surface_flux_arr, 4) && k <= size(surface_flux_arr, 5))
        u_ll, u_rr = get_surface_node_vars(interfaces_u, equations, j2, j3, k)
        orientation = orientations[k]
        surface_flux_node = surface_flux(u_ll, u_rr, orientation, equations)

        @inbounds begin
            for j1j1 in axes(surface_flux_arr, 2)
                surface_flux_arr[1, j1j1, j2, j3, k] = surface_flux_node[j1j1]
            end
        end
    end

    return nothing
end

du, u = copy_to_gpu!(du, u)

cuda_volume_integral!(
    du, u, mesh,
    have_nonconservative_terms(equations), equations,
    solver.volume_integral, solver)

cuda_prolong2interfaces!(u, mesh, cache)

surface_flux = solver.surface_integral.surface_flux
interfaces_u = CuArray{Float32}(cache.interfaces.u)
neighbor_ids = CuArray{Int32}(cache.interfaces.neighbor_ids)
orientations = CuArray{Int32}(cache.interfaces.orientations)
surface_flux_arr = CuArray{Float32}(undef, 1, size(interfaces_u)[2:end]...)

size_arr = CuArray{Float32}(undef, size(interfaces_u, 3), size(interfaces_u, 4), size(interfaces_u, 5))

surface_flux_kernel1 = @cuda launch = false surface_flux_kernel1!(
    surface_flux_arr, interfaces_u, orientations, equations, surface_flux)

This gave me the error you mentioned above. Next, I ran

julia> code_typed(err[1].exception)
1-element Vector{Any}:
 CodeInfo(
...
└──── %650 = Base.mul_float(%648, %649)::Float64
109 ┄ %651 = φ (#105 => %627, #107 => %640, #108 => %650)::Float64
│     %652 = φ (#105 => %623, #107 => %636, #108 => %646)::Union{Float32, Float64}
│     %653 = φ (#105 => %622, #107 => %635, #108 => %643)::Union{Float32, Float64}
│     %654 = φ (#105 => %621, #107 => %632, #108 => %642)::Union{Float32, Float64}
│     %655 = φ (#105 => %184, #107 => %233, #108 => %282)::Float32
│     %656 = invoke Main.SVector(%655::Float32, %654::Vararg{Any}, %653, %652, %651)::Any
...

This type instability can be observed more nicely via

julia> code_warntype(err[1].exception)
...
  surface_flux_node::Any
...
│   %64 = Base.getindex(surface_flux_node, j1j1)::Any
...

It basically boils down to

julia> begin
       u_ll = SVector(1.0f0, 0.1f0, 0.2f0, 0.3f0, 2.0f0)
       u_rr = u_ll
       equations = CompressibleEulerEquations3D(1.4f0)
       orientation = 1
       @code_warntype flux_lax_friedrichs(u_ll, u_rr, orientation, equations)
       end
MethodInstance for (::FluxLaxFriedrichs{typeof(max_abs_speed_naive)})(::SVector{5, Float32}, ::SVector{5, Float32}, ::Int64, ::CompressibleEulerEquations3D{Float32})
  from (numflux::FluxPlusDissipation)(u_ll, u_rr, orientation_or_normal_direction, equations) @ Main /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:38
Arguments
  numflux::Core.Const(FluxLaxFriedrichs(max_abs_speed_naive))
  u_ll::SVector{5, Float32}
  u_rr::SVector{5, Float32}
  orientation_or_normal_direction::Int64
  equations::CompressibleEulerEquations3D{Float32}
Locals
  dissipation::DissipationLocalLaxFriedrichs{typeof(max_abs_speed_naive)}
  numerical_flux::typeof(flux_central)
Body::Any
1 ─      nothing
│        (numerical_flux = Base.getproperty(numflux, :numerical_flux))
│        (dissipation = Base.getproperty(numflux, :dissipation))
│   %4 = (numerical_flux)(u_ll, u_rr, orientation_or_normal_direction, equations)::Any
│   %5 = (dissipation)(u_ll, u_rr, orientation_or_normal_direction, equations)::SVector{5, Float64}
│   %6 = (%4 + %5)::Any
└──      return %6

julia> begin
       u_ll = SVector(1.0f0, 0.1f0, 0.2f0, 0.3f0, 2.0f0)
       u_rr = u_ll
       equations = CompressibleEulerEquations3D(1.4f0)
       orientation = 1
       @code_warntype flux_central(u_ll, u_rr, orientation, equations)
       end
MethodInstance for flux_central(::SVector{5, Float32}, ::SVector{5, Float32}, ::Int64, ::CompressibleEulerEquations3D{Float32})
  from flux_central(u_ll, u_rr, orientation_or_normal_direction, equations::AbstractEquations) @ Main /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:17
Arguments
  #self#::Core.Const(flux_central)
  u_ll::SVector{5, Float32}
  u_rr::SVector{5, Float32}
  orientation_or_normal_direction::Int64
  equations::CompressibleEulerEquations3D{Float32}
Locals
  f_rr::Any
  f_ll::Any
Body::Any
1 ─      nothing
│        (f_ll = Main.flux(u_ll, orientation_or_normal_direction, equations))
│        (f_rr = Main.flux(u_rr, orientation_or_normal_direction, equations))
│   %4 = (f_ll + f_rr)::Any
│   %5 = (0.5 * %4)::Any
└──      return %5

julia> begin
       u_ll = SVector(1.0f0, 0.1f0, 0.2f0, 0.3f0, 2.0f0)
       u_rr = u_ll
       equations = CompressibleEulerEquations3D(1.4f0)
       orientation = 1
       @code_warntype flux(u_ll, orientation, equations)
       end
...
  f5::Float64
  f4::Union{Float32, Float64}
  f3::Union{Float32, Float64}
  f2::Union{Float32, Float64}
  f1::Float32
...
6 ┄ %60 = Main.SVector(f1, f2, f3, f4, f5)::Any
└──       return %60

The problem is that we have explicit constants like 0.5::Float64 in code like

https://github.com/trixi-framework/Trixi.jl/blob/766e3f94465f48608c92d1fe91cd46db4c31c362/src/equations/compressible_euler_3d.jl#L390

Could you please check whether it works if you use Float64 on the GPU instead of Float32?

ranocha · 2023-07-11T11:44:27Z

Another option is to rewrite the code to use 32 bit floating point literals for constants, see #4 for example.

ranocha · 2023-07-11T11:49:59Z

We have an open issue in Trixi.jl for this - trixi-framework/Trixi.jl#591 - which I just revived. We should discuss it with other Trixi.jl developers there.

huiyuxie · 2023-07-12T02:04:14Z

Thank you so much for providing the detailed debugging process and finally solving it for me! I have changed the float type from Float64 to Float32 for files numerical_fluxes.jl and compressible_euler_3d.jl via @changepricision, and also changed the header part and it worked. Pretty much thanks again as solving issues should have been my work. Thanks!

huiyuxie added a commit that referenced this issue Jul 8, 2023

update #3'

0ed812c

'

huiyuxie pinned this issue Jul 8, 2023

huiyuxie added a commit that referenced this issue Jul 8, 2023

run #3

082c700

huiyuxie unpinned this issue Jul 10, 2023

ranocha mentioned this issue Jul 11, 2023

fix Float32/Float64 issue for 3D surface_flux_kernel1! #4

Closed

ranocha mentioned this issue Jul 11, 2023

Add tests using other real types trixi-framework/Trixi.jl#591

Open

24 tasks

huiyuxie closed this as completed in bbf0ede Jul 12, 2023

huiyuxie added this to CUDA Support @trixi-framework/Trixi.jl May 13, 2024

huiyuxie moved this to Done in CUDA Support @trixi-framework/Trixi.jl May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report errors when apply `surface_flux` for 3D #3

Report errors when apply `surface_flux` for 3D #3

huiyuxie commented Jul 8, 2023 •

edited

Loading

ranocha commented Jul 10, 2023 •

edited

Loading

huiyuxie commented Jul 10, 2023

ranocha commented Jul 11, 2023 •

edited

Loading

ranocha commented Jul 11, 2023

ranocha commented Jul 11, 2023

ranocha commented Jul 11, 2023

huiyuxie commented Jul 12, 2023

Report errors when apply surface_flux for 3D #3

Report errors when apply surface_flux for 3D #3

Comments

huiyuxie commented Jul 8, 2023 • edited Loading

ranocha commented Jul 10, 2023 • edited Loading

huiyuxie commented Jul 10, 2023

ranocha commented Jul 11, 2023 • edited Loading

ranocha commented Jul 11, 2023

ranocha commented Jul 11, 2023

ranocha commented Jul 11, 2023

huiyuxie commented Jul 12, 2023

Report errors when apply `surface_flux` for 3D #3

Report errors when apply `surface_flux` for 3D #3

huiyuxie commented Jul 8, 2023 •

edited

Loading

ranocha commented Jul 10, 2023 •

edited

Loading

ranocha commented Jul 11, 2023 •

edited

Loading