Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report errors when apply surface_flux for 3D #3

Closed
huiyuxie opened this issue Jul 8, 2023 · 7 comments
Closed

Report errors when apply surface_flux for 3D #3

huiyuxie opened this issue Jul 8, 2023 · 7 comments

Comments

@huiyuxie
Copy link
Member

huiyuxie commented Jul 8, 2023

Kernel surface_flux_kernel!() worked for both 1D and 2D but failed for 3D.
1D version surface_flux_kernel!() function
https://github.com/huiyuxie/trixi_cuda/blob/a6e268a54203e725c67997f7f5a33bd82001fd60/cuda_dg_1d.jl#L173-L190
2D version surface_flux_kernel!() function
https://github.com/huiyuxie/trixi_cuda/blob/a6e268a54203e725c67997f7f5a33bd82001fd60/cuda_dg_2d.jl#L196-L215
But for 3D equations, we provided two ways for comparing the benchmark results but both failed
https://github.com/huiyuxie/trixi_cuda/blob/0ed812ce977b5ebc80f7a92c042864acb1625a46/cuda_dg_3d.jl#L205-L248
The errors are the same like

ERROR: InvalidIRError: compiling MethodInstance for surface_flux_kernel!(::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 5, 1}, ::CuDeviceVector{Int32, 1}, ::CompressibleEulerEquations3D{Float64}, ::FluxLaxFriedrichs{typeof(max_abs_speed_naive)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:219
Reason: unsupported dynamic function invocation (call to convert)
Stacktrace:
 [1] setindex!
   @ ~/.julia/packages/CUDA/pCcGc/src/device/array.jl:166
 [2] setindex!
   @ ~/.julia/packages/CUDA/pCcGc/src/device/array.jl:179
 [3] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:219
Reason: unsupported dynamic function invocation (call to (::Type{SA})(x...) where SA<:StaticArray @ StaticArrays ~/.julia/packages/StaticArrays/O6dgq/src/convert.jl:160)
Stacktrace:
 [1] flux
   @ ~/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:399
 [2] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:20
 [3] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [4] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to (::Type{SA})(x...) where SA<:StaticArray @ StaticArrays ~/.julia/packages/StaticArrays/O6dgq/src/convert.jl:160)
Stacktrace:
 [1] flux
   @ ~/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:399
 [2] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:21
 [3] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [4] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
 [1] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:24
 [2] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [3] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to *)
Stacktrace:
 [1] flux_central
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:24
 [2] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [3] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Reason: unsupported dynamic function invocation (call to +)
Stacktrace:
 [1] FluxPlusDissipation
   @ ~/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:41
 [2] surface_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:215
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/validation.jl:149
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:411 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:410 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:118
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:92 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:88
 [10] compile
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:79 [inlined]
 [11] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:125
 [12] #1032
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:120 [inlined]
 [13] LLVM.ThreadSafeContext(f::CUDA.var"#1032#1033"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ LLVM ~/.julia/packages/LLVM/5aiiG/src/executionengine/ts_module.jl:14
 [14] JuliaContext
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:35 [inlined]
 [15] compile
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:119 [inlined]
 [16] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:125
 [17] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:103
 [18] macro expansion
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:318 [inlined]
 [19] macro expansion
    @ ./lock.jl:267 [inlined]
 [20] cufunction(f::typeof(surface_flux_kernel!), tt::Type{Tuple{CuDeviceArray{Float32, 5, 1}, CuDeviceArray{Float32, 5, 1}, CuDeviceVector{Int32, 1}, CompressibleEulerEquations3D{Float64}, FluxLaxFriedrichs{typeof(max_abs_speed_naive)}}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:313
 [21] cufunction(f::typeof(surface_flux_kernel!), tt::Type{Tuple{CuDeviceArray{Float32, 5, 1}, CuDeviceArray{Float32, 5, 1}, CuDeviceVector{Int32, 1}, CompressibleEulerEquations3D{Float64}, FluxLaxFriedrichs{typeof(max_abs_speed_naive)}}})
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:310
 [22] top-level scope
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:104

Both kernels for 3D worked for test/linear_scalar_advection_3d.jl but both failed for test/compressible_euler_3d.jl. Having checked the flux computing process for 1D, 2D, and 3D compressible euler equations but found nothing extremely different. So currently the issue cannot be solved.

huiyuxie added a commit that referenced this issue Jul 8, 2023
@huiyuxie huiyuxie pinned this issue Jul 8, 2023
huiyuxie added a commit that referenced this issue Jul 8, 2023
@ranocha
Copy link
Collaborator

ranocha commented Jul 10, 2023

Which versions of

  • Julia
  • Trixi.jl
  • StaticArrays.jl

are you using?

@huiyuxie huiyuxie unpinned this issue Jul 10, 2023
@huiyuxie
Copy link
Member Author

Thanks for helping me solve the issue!

Julia v1.9.0
Trixi v0.5.28
StaticArrays v1.5.26

@ranocha
Copy link
Collaborator

ranocha commented Jul 11, 2023

That's an ugly error. Did you try to follow the

Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

given in the stacktrace? It may be that we're running into some compiler heuristics that work fine for small tuples but not as well for slightly larger tuples - which is why it manifests only for 3D.

@ranocha
Copy link
Collaborator

ranocha commented Jul 11, 2023

Aha, I think I understood something. I am using Julia v1.9.2 but that shouldn't matter. First, I cloned this repo and ran

(trixi_cuda) pkg> instantiate

Then, I executed

julia> include("header.jl")

julia> include("cuda_dg_3d.jl")

and ran the code

# CUDA kernel for calculating surface fluxes 
function surface_flux_kernel1!(surface_flux_arr, interfaces_u, orientations,
    equations::AbstractEquations{3}, surface_flux::Any)
    j2 = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    j3 = (blockIdx().y - 1) * blockDim().y + threadIdx().y
    k = (blockIdx().z - 1) * blockDim().z + threadIdx().z

    if (j2 <= size(surface_flux_arr, 3) && j3 <= size(surface_flux_arr, 4) && k <= size(surface_flux_arr, 5))
        u_ll, u_rr = get_surface_node_vars(interfaces_u, equations, j2, j3, k)
        orientation = orientations[k]
        surface_flux_node = surface_flux(u_ll, u_rr, orientation, equations)

        @inbounds begin
            for j1j1 in axes(surface_flux_arr, 2)
                surface_flux_arr[1, j1j1, j2, j3, k] = surface_flux_node[j1j1]
            end
        end
    end

    return nothing
end

du, u = copy_to_gpu!(du, u)

cuda_volume_integral!(
    du, u, mesh,
    have_nonconservative_terms(equations), equations,
    solver.volume_integral, solver)

cuda_prolong2interfaces!(u, mesh, cache)

surface_flux = solver.surface_integral.surface_flux
interfaces_u = CuArray{Float32}(cache.interfaces.u)
neighbor_ids = CuArray{Int32}(cache.interfaces.neighbor_ids)
orientations = CuArray{Int32}(cache.interfaces.orientations)
surface_flux_arr = CuArray{Float32}(undef, 1, size(interfaces_u)[2:end]...)

size_arr = CuArray{Float32}(undef, size(interfaces_u, 3), size(interfaces_u, 4), size(interfaces_u, 5))

surface_flux_kernel1 = @cuda launch = false surface_flux_kernel1!(
    surface_flux_arr, interfaces_u, orientations, equations, surface_flux)

This gave me the error you mentioned above. Next, I ran

julia> code_typed(err[1].exception)
1-element Vector{Any}:
 CodeInfo(
...
└──── %650 = Base.mul_float(%648, %649)::Float64
109%651 = φ (#105 => %627, #107 => %640, #108 => %650)::Float64%652 = φ (#105 => %623, #107 => %636, #108 => %646)::Union{Float32, Float64}%653 = φ (#105 => %622, #107 => %635, #108 => %643)::Union{Float32, Float64}%654 = φ (#105 => %621, #107 => %632, #108 => %642)::Union{Float32, Float64}%655 = φ (#105 => %184, #107 => %233, #108 => %282)::Float32%656 = invoke Main.SVector(%655::Float32, %654::Vararg{Any}, %653, %652, %651)::Any
...

This type instability can be observed more nicely via

julia> code_warntype(err[1].exception)
...
  surface_flux_node::Any
...%64 = Base.getindex(surface_flux_node, j1j1)::Any
...

It basically boils down to

julia> begin
       u_ll = SVector(1.0f0, 0.1f0, 0.2f0, 0.3f0, 2.0f0)
       u_rr = u_ll
       equations = CompressibleEulerEquations3D(1.4f0)
       orientation = 1
       @code_warntype flux_lax_friedrichs(u_ll, u_rr, orientation, equations)
       end
MethodInstance for (::FluxLaxFriedrichs{typeof(max_abs_speed_naive)})(::SVector{5, Float32}, ::SVector{5, Float32}, ::Int64, ::CompressibleEulerEquations3D{Float32})
  from (numflux::FluxPlusDissipation)(u_ll, u_rr, orientation_or_normal_direction, equations) @ Main /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:38
Arguments
  numflux::Core.Const(FluxLaxFriedrichs(max_abs_speed_naive))
  u_ll::SVector{5, Float32}
  u_rr::SVector{5, Float32}
  orientation_or_normal_direction::Int64
  equations::CompressibleEulerEquations3D{Float32}
Locals
  dissipation::DissipationLocalLaxFriedrichs{typeof(max_abs_speed_naive)}
  numerical_flux::typeof(flux_central)
Body::Any
1nothing
│        (numerical_flux = Base.getproperty(numflux, :numerical_flux))
│        (dissipation = Base.getproperty(numflux, :dissipation))
│   %4 = (numerical_flux)(u_ll, u_rr, orientation_or_normal_direction, equations)::Any%5 = (dissipation)(u_ll, u_rr, orientation_or_normal_direction, equations)::SVector{5, Float64}%6 = (%4 + %5)::Any
└──      return %6

julia> begin
       u_ll = SVector(1.0f0, 0.1f0, 0.2f0, 0.3f0, 2.0f0)
       u_rr = u_ll
       equations = CompressibleEulerEquations3D(1.4f0)
       orientation = 1
       @code_warntype flux_central(u_ll, u_rr, orientation, equations)
       end
MethodInstance for flux_central(::SVector{5, Float32}, ::SVector{5, Float32}, ::Int64, ::CompressibleEulerEquations3D{Float32})
  from flux_central(u_ll, u_rr, orientation_or_normal_direction, equations::AbstractEquations) @ Main /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:17
Arguments
  #self#::Core.Const(flux_central)
  u_ll::SVector{5, Float32}
  u_rr::SVector{5, Float32}
  orientation_or_normal_direction::Int64
  equations::CompressibleEulerEquations3D{Float32}
Locals
  f_rr::Any
  f_ll::Any
Body::Any
1nothing
│        (f_ll = Main.flux(u_ll, orientation_or_normal_direction, equations))
│        (f_rr = Main.flux(u_rr, orientation_or_normal_direction, equations))
│   %4 = (f_ll + f_rr)::Any%5 = (0.5 * %4)::Any
└──      return %5

julia> begin
       u_ll = SVector(1.0f0, 0.1f0, 0.2f0, 0.3f0, 2.0f0)
       u_rr = u_ll
       equations = CompressibleEulerEquations3D(1.4f0)
       orientation = 1
       @code_warntype flux(u_ll, orientation, equations)
       end
...
  f5::Float64
  f4::Union{Float32, Float64}
  f3::Union{Float32, Float64}
  f2::Union{Float32, Float64}
  f1::Float32
...
6%60 = Main.SVector(f1, f2, f3, f4, f5)::Any
└──       return %60

The problem is that we have explicit constants like 0.5::Float64 in code like

https://github.com/trixi-framework/Trixi.jl/blob/766e3f94465f48608c92d1fe91cd46db4c31c362/src/equations/compressible_euler_3d.jl#L390

Could you please check whether it works if you use Float64 on the GPU instead of Float32?

@ranocha
Copy link
Collaborator

ranocha commented Jul 11, 2023

Another option is to rewrite the code to use 32 bit floating point literals for constants, see #4 for example.

@ranocha
Copy link
Collaborator

ranocha commented Jul 11, 2023

We have an open issue in Trixi.jl for this - trixi-framework/Trixi.jl#591 - which I just revived. We should discuss it with other Trixi.jl developers there.

@huiyuxie
Copy link
Member Author

Thank you so much for providing the detailed debugging process and finally solving it for me! I have changed the float type from Float64 to Float32 for files numerical_fluxes.jl and compressible_euler_3d.jl via @changepricision, and also changed the header part and it worked. Pretty much thanks again as solving issues should have been my work. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants