Skip to content

Commit

Permalink
Optimize volume integral kernel for flux differencing (#105)
Browse files Browse the repository at this point in the history
* Start

* Update README.md

* Fix typo

* Flux kernel

* Comments

* Complete 1D

* Complete 2D

* Complete 3D
  • Loading branch information
huiyuxie authored Jan 3, 2025
1 parent 0654e9a commit 39853c2
Show file tree
Hide file tree
Showing 9 changed files with 1,050 additions and 227 deletions.
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,17 @@ TrixiCUDA.jl offers CUDA acceleration for solving hyperbolic PDEs.

⚠️ **Warning:** Our package may not always be updated with the latest updates or improvements in Trixi.jl. Forcing an update of Trixi.jl as a dependency for TrixiCUDA.jl beyond the version bounds specified in `Project.toml` may cause unexpected errors.

*Update on Nov 21, 2024*:
*Update on Dec 31, 2024*:
- The initial round of kernel optimization starts with the volume integral kernels (see [TrixiCUDA.jl PR #102](https://github.com/trixi-gpu/TrixiCUDA.jl/pull/102)) and will later extend to all existing kernels used in the semidiscretization process. The primary approach includes improving global memory access patterns, using shared memory, and selecting appropriate launch sizes to reduce warp stalls and achieve better occupancy.

*Update on Nov 21, 2024*:
- Due to the [incompatibility issue](https://github.com/trixi-framework/Trixi.jl/issues/2108) from upstream with Trixi.jl and CUDA.jl in Julia v1.11, this package now supports only Julia v1.10. Using or developing this package with Julia v1.11 will result in precompilation errors. To fix this, downgrade to Julia v1.10. If you have any other problems, please file issues [here](https://github.com/trixi-gpu/TrixiCUDA.jl/issues).

*Update on Oct 30, 2024*:
- The general documentation is now available at https://trixi-gpu.github.io (in development).
- Documentation specific to this package can be found at https://trixi-gpu.github.io/TrixiCUDA.jl/dev (in development).

*Update on Oct 11, 2024*:

- Development on Julia v1.11 is currently on hold due to an issue between the latest version of CUDA.jl and Trixi.jl. The fix is in progress.
- Trace the root issue in [Trixi.jl Issue #1789](https://github.com/trixi-framework/Trixi.jl/issues/1789): SciMLBase.jl has dropped support for arrays of `SVector`.
- [Trixi.jl PR #2150](https://github.com/trixi-framework/Trixi.jl/pull/2150) is opened to replace arrays of `SVector` using RecursiveArrayTools.jl.
- It requires updating RecursiveArrayTools.jl, which is compatible with Julia >= v1.10. However, Trixi.jl has some legacy tests relying on Julia v1.8 and v1.9. See more discussions in [Trixi.jl PR #2194](https://github.com/trixi-framework/Trixi.jl/pull/2194).
[Archived Update](https://trixi-gpu.github.io/update/)


# Package Installation
Expand Down
151 changes: 151 additions & 0 deletions benchmark/euler_ec_1d.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
using Trixi, TrixiCUDA
using CUDA
using BenchmarkTools

# Set the precision
RealT = Float32

# Set up the problem
equations = CompressibleEulerEquations1D(1.4f0)

initial_condition = initial_condition_weak_blast_wave

volume_flux = flux_ranocha
solver = DGSEM(polydeg = 3, surface_flux = flux_ranocha,
volume_integral = VolumeIntegralFluxDifferencing(volume_flux),
RealT = RealT)

coordinates_min = (-2.0f0,)
coordinates_max = (2.0f0,)
mesh = TreeMesh(coordinates_min, coordinates_max,
initial_refinement_level = 5,
n_cells_max = 10_000, RealT = RealT)

# Cache initialization
@info "Time for cache initialization on CPU"
@time semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver)
@info "Time for cache initialization on GPU"
CUDA.@time semi_gpu = SemidiscretizationHyperbolicGPU(mesh, equations, initial_condition, solver)

tspan = tspan_gpu = (0.0f0, 0.4f0)
t = t_gpu = 0.0f0

# Semi on CPU
(; mesh, equations, boundary_conditions, source_terms, solver, cache) = semi

# Semi on GPU
equations_gpu = semi_gpu.equations
mesh_gpu, solver_gpu, cache_gpu = semi_gpu.mesh, semi_gpu.solver, semi_gpu.cache
boundary_conditions_gpu = semi_gpu.boundary_conditions
source_terms_gpu = semi_gpu.source_terms

# ODE on CPU
ode = semidiscretize(semi, tspan)
u_ode = copy(ode.u0)
du_ode = similar(u_ode)
u = Trixi.wrap_array(u_ode, mesh, equations, solver, cache)
du = Trixi.wrap_array(du_ode, mesh, equations, solver, cache)

# ODE on GPU
ode_gpu = semidiscretizeGPU(semi_gpu, tspan_gpu)
u_gpu = copy(ode_gpu.u0)
du_gpu = similar(u_gpu)

# # Reset du
# @info "Time for reset_du! on GPU"
# CUDA.@time TrixiCUDA.reset_du!(du_gpu)
# @info "Time for reset_du! on CPU"
# @time Trixi.reset_du!(du, solver, cache)

# Reset du and volume integral
@info "Time for reset_du! and volume_integral! on GPU"
@benchmark CUDA.@sync TrixiCUDA.cuda_volume_integral!(du_gpu, u_gpu, mesh_gpu,
Trixi.have_nonconservative_terms(equations_gpu),
equations_gpu, solver_gpu.volume_integral, solver_gpu,
cache_gpu)
# @info "Time for reset_du! and volume_integral! on CPU"
# @time begin
# Trixi.reset_du!(du, solver, cache)
# Trixi.calc_volume_integral!(du, u, mesh, Trixi.have_nonconservative_terms(equations),
# equations, solver.volume_integral, solver, cache)
# end

# # Prolong to interfaces
# @info "Time for prolong2interfaces! on GPU"
# CUDA.@time TrixiCUDA.cuda_prolong2interfaces!(u_gpu, mesh_gpu, equations_gpu, cache_gpu)
# @info "Time for prolong2interfaces! on CPU"
# @time Trixi.prolong2interfaces!(cache, u, mesh, equations, solver.surface_integral, solver)

# # Interface flux
# @info "Time for interface_flux! on GPU"
# CUDA.@time TrixiCUDA.cuda_interface_flux!(mesh_gpu,
# Trixi.have_nonconservative_terms(equations_gpu),
# equations_gpu, solver_gpu, cache_gpu)
# @info "Time for interface_flux! on CPU"
# @time Trixi.calc_interface_flux!(cache.elements.surface_flux_values, mesh,
# Trixi.have_nonconservative_terms(equations), equations,
# solver.surface_integral, solver, cache)

# # Prolong to boundaries
# @info "Time for prolong2boundaries! on GPU"
# CUDA.@time TrixiCUDA.cuda_prolong2boundaries!(u_gpu, mesh_gpu, boundary_conditions_gpu,
# equations_gpu, cache_gpu)
# @info "Time for prolong2boundaries! on CPU"
# @time Trixi.prolong2boundaries!(cache, u, mesh, equations, solver.surface_integral, solver)

# # Boundary flux
# @info "Time for boundary_flux! on GPU"
# CUDA.@time TrixiCUDA.cuda_boundary_flux!(t_gpu, mesh_gpu, boundary_conditions_gpu,
# Trixi.have_nonconservative_terms(equations_gpu),
# equations_gpu, solver_gpu, cache_gpu)
# @info "Time for boundary_flux! on CPU"
# @time Trixi.calc_boundary_flux!(cache, t, boundary_conditions, mesh, equations,
# solver.surface_integral, solver)

# # Prolong to mortars
# @info "Time for prolong2mortars! on GPU"
# CUDA.@time TrixiCUDA.cuda_prolong2mortars!(u_gpu, mesh_gpu,
# TrixiCUDA.check_cache_mortars(cache_gpu),
# solver_gpu, cache_gpu)
# @info "Time for prolong2mortars! on CPU"
# @time Trixi.prolong2mortars!(cache, u, mesh, equations,
# solver.mortar, solver.surface_integral, solver)

# # Mortar flux
# @info "Time for mortar_flux! on GPU"
# CUDA.@time TrixiCUDA.cuda_mortar_flux!(mesh_gpu, TrixiCUDA.check_cache_mortars(cache_gpu),
# Trixi.have_nonconservative_terms(equations_gpu),
# equations_gpu, solver_gpu, cache_gpu)
# @info "Time for mortar_flux! on CPU"
# @time Trixi.calc_mortar_flux!(cache.elements.surface_flux_values, mesh,
# Trixi.have_nonconservative_terms(equations), equations,
# solver.mortar, solver.surface_integral, solver, cache)

# # Surface integral
# @info "Time for surface_integral! on GPU"
# CUDA.@time TrixiCUDA.cuda_surface_integral!(du_gpu, mesh_gpu, equations_gpu, solver_gpu, cache_gpu)
# @info "Time for surface_integral! on CPU"
# @time Trixi.calc_surface_integral!(du, u, mesh, equations, solver.surface_integral,
# solver, cache)

# # Jacobian
# @info "Time for jacobian! on GPU"
# CUDA.@time TrixiCUDA.cuda_jacobian!(du_gpu, mesh_gpu, equations_gpu, cache_gpu)
# @info "Time for jacobian! on CPU"
# @time Trixi.apply_jacobian!(du, mesh, equations, solver, cache)

# # Sources terms
# @info "Time for sources! on GPU"
# CUDA.@time TrixiCUDA.cuda_sources!(du_gpu, u_gpu, t_gpu, source_terms_gpu,
# equations_gpu, cache_gpu)
# @info "Time for sources! on CPU"
# @time Trixi.calc_sources!(du, u, t, source_terms, equations, solver, cache)

# # Semidiscretization process
# @info "Time for rhs! on GPU"
# CUDA.@time TrixiCUDA.rhs_gpu!(du_gpu, u_gpu, t_gpu, mesh_gpu, equations_gpu,
# boundary_conditions_gpu, source_terms_gpu,
# solver_gpu, cache_gpu)
# @info "Time for rhs! on CPU"
# @time Trixi.rhs!(du, u, t, mesh, equations, boundary_conditions, source_terms,
# solver, cache)
154 changes: 154 additions & 0 deletions benchmark/euler_ec_2d.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
using Trixi, TrixiCUDA
using CUDA
using BenchmarkTools

# Set the precision
RealT = Float32

# Set up the problem
equations = CompressibleEulerEquations2D(1.4f0)

initial_condition = initial_condition_weak_blast_wave

volume_flux = flux_ranocha
solver = DGSEM(polydeg = 3, surface_flux = flux_ranocha,
volume_integral = VolumeIntegralFluxDifferencing(volume_flux),
RealT = RealT)

coordinates_min = (-2.0f0, -2.0f0)
coordinates_max = (2.0f0, 2.0f0)
mesh = TreeMesh(coordinates_min, coordinates_max,
initial_refinement_level = 5,
n_cells_max = 10_000,
periodicity = true, RealT = RealT)

# Cache initialization
@info "Time for cache initialization on CPU"
@time semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver,
boundary_conditions = boundary_condition_periodic)
@info "Time for cache initialization on GPU"
CUDA.@time semi_gpu = SemidiscretizationHyperbolicGPU(mesh, equations, initial_condition, solver,
boundary_conditions = boundary_condition_periodic)

tspan = tspan_gpu = (0.0f0, 0.4f0)
t = t_gpu = 0.0f0

# Semi on CPU
(; mesh, equations, boundary_conditions, source_terms, solver, cache) = semi

# Semi on GPU
equations_gpu = semi_gpu.equations
mesh_gpu, solver_gpu, cache_gpu = semi_gpu.mesh, semi_gpu.solver, semi_gpu.cache
boundary_conditions_gpu = semi_gpu.boundary_conditions
source_terms_gpu = semi_gpu.source_terms

# ODE on CPU
ode = semidiscretize(semi, tspan)
u_ode = copy(ode.u0)
du_ode = similar(u_ode)
u = Trixi.wrap_array(u_ode, mesh, equations, solver, cache)
du = Trixi.wrap_array(du_ode, mesh, equations, solver, cache)

# ODE on GPU
ode_gpu = semidiscretizeGPU(semi_gpu, tspan_gpu)
u_gpu = copy(ode_gpu.u0)
du_gpu = similar(u_gpu)

# # Reset du
# @info "Time for reset_du! on GPU"
# CUDA.@time TrixiCUDA.reset_du!(du_gpu)
# @info "Time for reset_du! on CPU"
# @time Trixi.reset_du!(du, solver, cache)

# Reset du and volume integral
@info "Time for reset_du! and volume_integral! on GPU"
@benchmark CUDA.@sync TrixiCUDA.cuda_volume_integral!(du_gpu, u_gpu, mesh_gpu,
Trixi.have_nonconservative_terms(equations_gpu),
equations_gpu, solver_gpu.volume_integral, solver_gpu,
cache_gpu)
# @info "Time for reset_du! and volume_integral! on CPU"
# @time begin
# Trixi.reset_du!(du, solver, cache)
# Trixi.calc_volume_integral!(du, u, mesh, Trixi.have_nonconservative_terms(equations),
# equations, solver.volume_integral, solver, cache)
# end

# # Prolong to interfaces
# @info "Time for prolong2interfaces! on GPU"
# CUDA.@time TrixiCUDA.cuda_prolong2interfaces!(u_gpu, mesh_gpu, equations_gpu, cache_gpu)
# @info "Time for prolong2interfaces! on CPU"
# @time Trixi.prolong2interfaces!(cache, u, mesh, equations, solver.surface_integral, solver)

# # Interface flux
# @info "Time for interface_flux! on GPU"
# CUDA.@time TrixiCUDA.cuda_interface_flux!(mesh_gpu,
# Trixi.have_nonconservative_terms(equations_gpu),
# equations_gpu, solver_gpu, cache_gpu)
# @info "Time for interface_flux! on CPU"
# @time Trixi.calc_interface_flux!(cache.elements.surface_flux_values, mesh,
# Trixi.have_nonconservative_terms(equations), equations,
# solver.surface_integral, solver, cache)

# # Prolong to boundaries
# @info "Time for prolong2boundaries! on GPU"
# CUDA.@time TrixiCUDA.cuda_prolong2boundaries!(u_gpu, mesh_gpu, boundary_conditions_gpu,
# equations_gpu, cache_gpu)
# @info "Time for prolong2boundaries! on CPU"
# @time Trixi.prolong2boundaries!(cache, u, mesh, equations, solver.surface_integral, solver)

# # Boundary flux
# @info "Time for boundary_flux! on GPU"
# CUDA.@time TrixiCUDA.cuda_boundary_flux!(t_gpu, mesh_gpu, boundary_conditions_gpu,
# Trixi.have_nonconservative_terms(equations_gpu),
# equations_gpu, solver_gpu, cache_gpu)
# @info "Time for boundary_flux! on CPU"
# @time Trixi.calc_boundary_flux!(cache, t, boundary_conditions, mesh, equations,
# solver.surface_integral, solver)

# # Prolong to mortars
# @info "Time for prolong2mortars! on GPU"
# CUDA.@time TrixiCUDA.cuda_prolong2mortars!(u_gpu, mesh_gpu,
# TrixiCUDA.check_cache_mortars(cache_gpu),
# solver_gpu, cache_gpu)
# @info "Time for prolong2mortars! on CPU"
# @time Trixi.prolong2mortars!(cache, u, mesh, equations,
# solver.mortar, solver.surface_integral, solver)

# # Mortar flux
# @info "Time for mortar_flux! on GPU"
# CUDA.@time TrixiCUDA.cuda_mortar_flux!(mesh_gpu, TrixiCUDA.check_cache_mortars(cache_gpu),
# Trixi.have_nonconservative_terms(equations_gpu),
# equations_gpu, solver_gpu, cache_gpu)
# @info "Time for mortar_flux! on CPU"
# @time Trixi.calc_mortar_flux!(cache.elements.surface_flux_values, mesh,
# Trixi.have_nonconservative_terms(equations), equations,
# solver.mortar, solver.surface_integral, solver, cache)

# # Surface integral
# @info "Time for surface_integral! on GPU"
# CUDA.@time TrixiCUDA.cuda_surface_integral!(du_gpu, mesh_gpu, equations_gpu, solver_gpu, cache_gpu)
# @info "Time for surface_integral! on CPU"
# @time Trixi.calc_surface_integral!(du, u, mesh, equations, solver.surface_integral,
# solver, cache)

# # Jacobian
# @info "Time for jacobian! on GPU"
# CUDA.@time TrixiCUDA.cuda_jacobian!(du_gpu, mesh_gpu, equations_gpu, cache_gpu)
# @info "Time for jacobian! on CPU"
# @time Trixi.apply_jacobian!(du, mesh, equations, solver, cache)

# # Sources terms
# @info "Time for sources! on GPU"
# CUDA.@time TrixiCUDA.cuda_sources!(du_gpu, u_gpu, t_gpu, source_terms_gpu,
# equations_gpu, cache_gpu)
# @info "Time for sources! on CPU"
# @time Trixi.calc_sources!(du, u, t, source_terms, equations, solver, cache)

# # Semidiscretization process
# @info "Time for rhs! on GPU"
# CUDA.@time TrixiCUDA.rhs_gpu!(du_gpu, u_gpu, t_gpu, mesh_gpu, equations_gpu,
# boundary_conditions_gpu, source_terms_gpu,
# solver_gpu, cache_gpu)
# @info "Time for rhs! on CPU"
# @time Trixi.rhs!(du, u, t, mesh, equations, boundary_conditions, source_terms,
# solver, cache)
Loading

0 comments on commit 39853c2

Please sign in to comment.