Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite @threaded macro to work with GPUs #534

Merged
merged 25 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
301059e
Fix wrapping for GPU arrays
efaulhaber May 22, 2024
917e0e5
Adapt `@threaded` macro to work with GPUs
efaulhaber May 22, 2024
19b7806
Use new `@threaded` macro
efaulhaber May 22, 2024
9644021
Fix NHS to run GPU code with CPU arrays
efaulhaber May 22, 2024
1ff91d3
Import `@index` from KernelAbstractions.jl
efaulhaber May 23, 2024
7514ac5
Add proper docs for `@threaded`
efaulhaber Jun 12, 2024
eb4903b
Fix `foreach_neighbor`
efaulhaber Jun 12, 2024
bafccab
Reformat
efaulhaber Jun 12, 2024
e7b6b6c
Add test for GPU code on the CPU
efaulhaber Jun 12, 2024
f10246f
Fix unit tests
efaulhaber Jun 12, 2024
c6103d9
Add unit tests for GPU interaction
efaulhaber Jun 12, 2024
b116731
Also add tests for GPU versions of `wrap_v` and `wrap_u`
efaulhaber Jun 12, 2024
6c14177
Add kwarg `neighborhood_search` to dam break example
efaulhaber Jun 12, 2024
4dab0bd
Fix tests
efaulhaber Jun 12, 2024
3a4a167
Merge branch 'threaded-macro-gpu' of github.com:efaulhaber/TrixiParti…
efaulhaber Jun 12, 2024
a4d8aef
Reformat
efaulhaber Jun 12, 2024
8e8efa8
Add function `wrap_array`
efaulhaber Jun 19, 2024
6f6e02d
Merge branch 'main' into threaded-macro-gpu
efaulhaber Jun 19, 2024
969cad2
Fix merge
efaulhaber Jun 20, 2024
0870d57
Merge branch 'main' into threaded-macro-gpu
efaulhaber Jun 20, 2024
b05f045
Fix open boundary code
efaulhaber Jun 20, 2024
613ec7e
Implement suggestions
efaulhaber Jun 20, 2024
9795536
Fix syntax error from merge
efaulhaber Jun 20, 2024
9313efc
Fix `wrap_u` and `wrap_v` for GPU array types
efaulhaber Jun 20, 2024
2140a67
Fix constructor of `BoundarySPHSystem`
efaulhaber Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DiffEqCallbacks = "459566f4-90b8-5000-8ac3-15dfb0a30def"
FastPow = "c0e83750-1142-43a8-81cf-6c956b72b4d1"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GPUArrays = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MuladdMacro = "46d2c3a1-f734-5fdb-9937-b9b9aeba4221"
PointNeighbors = "1c4d5385-0a27-49de-8e2c-43b175c8985c"
Expand All @@ -35,6 +37,7 @@ FastPow = "0.1"
ForwardDiff = "0.10"
JSON = "0.21"
MuladdMacro = "0.2"
PointNeighbors = "0.2.3"
Polyester = "0.7.5"
RecipesBase = "1"
Reexport = "1"
Expand All @@ -43,6 +46,5 @@ StaticArrays = "1"
StrideArrays = "0.1"
TimerOutputs = "0.5"
TrixiBase = "0.1.3"
PointNeighbors = "0.2"
WriteVTK = "1"
julia = "1.9"
6 changes: 4 additions & 2 deletions examples/fluid/dam_break_2d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,10 @@ boundary_system = BoundarySPHSystem(tank.boundary, boundary_model, adhesion_coef

# ==========================================================================================
# ==== Simulation
semi = Semidiscretization(fluid_system, boundary_system, threaded_nhs_update=true)
ode = semidiscretize(semi, tspan)
semi = Semidiscretization(fluid_system, boundary_system,
neighborhood_search=GridNeighborhoodSearch,
threaded_nhs_update=true)
ode = semidiscretize(semi, tspan, data_type=nothing)

info_callback = InfoCallback(interval=100)

Expand Down
4 changes: 3 additions & 1 deletion examples/n_body/n_body_system.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
using TrixiParticles
using LinearAlgebra

struct NBodySystem{NDIMS, ELTYPE <: Real} <: TrixiParticles.System{NDIMS, Nothing}
# The second type parameter of `System` can't be `Nothing`, or TrixiParticles will launch
# GPU kernel for `for_particle_neighbor` loops.
struct NBodySystem{NDIMS, ELTYPE <: Real} <: TrixiParticles.System{NDIMS, 0}
initial_condition :: InitialCondition{ELTYPE}
mass :: Array{ELTYPE, 1} # [particle]
G :: ELTYPE
Expand Down
6 changes: 5 additions & 1 deletion src/TrixiParticles.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ using DataFrames: DataFrame
using DiffEqCallbacks: PeriodicCallback, PeriodicCallbackAffect, PresetTimeCallback
using FastPow: @fastpow
using ForwardDiff: ForwardDiff
using GPUArrays: AbstractGPUArray
using JSON: JSON
using KernelAbstractions: KernelAbstractions, @kernel, @index
using LinearAlgebra: norm, dot, I, tr, inv, pinv, det
using MuladdMacro: @muladd
using Polyester: Polyester, @batch
Expand All @@ -27,7 +29,9 @@ using TrixiBase: trixi_include, @trixi_timeit, timer, timeit_debug_enabled,
using PointNeighbors: PointNeighbors, for_particle_neighbor
using WriteVTK: vtk_grid, MeshCell, VTKCellTypes, paraview_collection, vtk_save

# util needs to be first because of macro @trixi_timeit
# `util.jl` depends on the `GPUSystem` type defined in `system.jl`
include("general/system.jl")
# `util.jl` needs to be next because of the macros `@trixi_timeit` and `@threaded`
include("util.jl")
include("callbacks/callbacks.jl")
include("general/general.jl")
Expand Down
2 changes: 1 addition & 1 deletion src/general/corrections.jl
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ function compute_gradient_correction_matrix!(corr_matrix::AbstractArray, system,
end

function correction_matrix_inversion_step!(corr_matrix, system)
@threaded for particle in eachparticle(system)
@threaded system for particle in eachparticle(system)
L = extract_smatrix(corr_matrix, system, particle)

# The matrix `L` only becomes singular when the particle and all neighbors
Expand Down
30 changes: 2 additions & 28 deletions src/general/general.jl
Original file line number Diff line number Diff line change
@@ -1,37 +1,11 @@
# Abstract supertype for all system types. We additionally store the type of the system's
# initial condition, which is `Nothing` when using KernelAbstractions.jl.
abstract type System{NDIMS, IC} end

# When using KernelAbstractions.jl, the initial condition has been replaced by `nothing`
GPUSystem = System{NDIMS, Nothing} where {NDIMS}

abstract type FluidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::FluidSystem) = "fluid"
vtkname(system::FluidSystem) = "fluid"

abstract type SolidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::SolidSystem) = "solid"
vtkname(system::SolidSystem) = "solid"

abstract type BoundarySystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::BoundarySystem) = "boundary"
vtkname(system::BoundarySystem) = "boundary"

@inline function set_zero!(du)
du .= zero(eltype(du))

return du
end

# Note that `semidiscretization.jl` depends on the system types and has to be
# included later.
# Note that `system.jl` has already been included.
# `semidiscretization.jl` depends on the system types and has to be included later.
# `density_calculators.jl` needs to be included before `corrections.jl`.
include("density_calculators.jl")
include("corrections.jl")
include("smoothing_kernels.jl")
include("initial_condition.jl")
include("buffer.jl")
include("system.jl")
include("interpolation.jl")
include("custom_quantities.jl")
include("neighborhood_search.jl")
8 changes: 8 additions & 0 deletions src/general/gpu.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Adapt.@adapt_structure DensityDiffusionAntuono
Adapt.@adapt_structure BoundarySPHSystem
Adapt.@adapt_structure BoundaryModelDummyParticles
Adapt.@adapt_structure BoundaryModelMonaghanKajtar
Adapt.@adapt_structure TotalLagrangianSPHSystem

# The initial conditions are only used for initialization, which happens before `adapt`ing
# the semidiscretization, so we don't need to store `InitialCondition`s on the GPU.
Expand All @@ -32,3 +33,10 @@ end
function Adapt.adapt_structure(to::typeof(Array), range::UnitRange)
return range
end

KernelAbstractions.get_backend(::PtrArray) = KernelAbstractions.CPU()
KernelAbstractions.get_backend(system::System) = KernelAbstractions.get_backend(system.mass)

function KernelAbstractions.get_backend(system::BoundarySPHSystem)
KernelAbstractions.get_backend(system.coordinates)
end
11 changes: 11 additions & 0 deletions src/general/neighborhood_search.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,14 @@ function PointNeighbors.for_particle_neighbor(f, system, neighbor_system,
for_particle_neighbor(f, system_coords, neighbor_coords, neighborhood_search,
particles=particles, parallel=parallel)
end

function PointNeighbors.for_particle_neighbor(f, system::GPUSystem, neighbor_system,
system_coords, neighbor_coords,
neighborhood_search;
particles=each_moving_particle(system),
parallel=true)
@threaded system for particle in particles
PointNeighbors.foreach_neighbor(f, system_coords, neighbor_coords,
neighborhood_search, particle)
end
end
42 changes: 27 additions & 15 deletions src/general/semidiscretization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -368,29 +368,41 @@ end

# We have to pass `system` here for type stability,
# since the type of `system` determines the return type.
@inline function wrap_v(v_ode, system, semi)
(; ranges_v) = semi

range = ranges_v[system_indices(system, semi)]

@boundscheck @assert length(range) == v_nvariables(system) * n_moving_particles(system)

return wrap_array(v_ode, range,
(StaticInt(v_nvariables(system)), n_moving_particles(system)))
end

@inline function wrap_u(u_ode, system, semi)
(; ranges_u) = semi

range = ranges_u[system_indices(system, semi)]

@boundscheck @assert length(range) == u_nvariables(system) * n_moving_particles(system)

# This is a non-allocating version of:
# return unsafe_wrap(Array{eltype(u_ode), 2}, pointer(view(u_ode, range)),
# (u_nvariables(system), n_moving_particles(system)))
return PtrArray(pointer(view(u_ode, range)),
(StaticInt(u_nvariables(system)), n_moving_particles(system)))
return wrap_array(u_ode, range,
(StaticInt(u_nvariables(system)), n_moving_particles(system)))
end

@inline function wrap_v(v_ode, system, semi)
(; ranges_v) = semi

range = ranges_v[system_indices(system, semi)]

@boundscheck @assert length(range) == v_nvariables(system) * n_moving_particles(system)
@inline function wrap_array(array::Array, range, size)
# This is a non-allocating version of:
# return unsafe_wrap(Array{eltype(array), 2}, pointer(view(array, range)), size)
return PtrArray(pointer(view(array, range)), size)
end

return PtrArray(pointer(view(v_ode, range)),
(StaticInt(v_nvariables(system)), n_moving_particles(system)))
@inline function wrap_array(array, range, size)
# For non-`Array`s (typically GPU arrays), just reshape. Calling the `PtrArray` code
# above for a `CuArray` yields another `CuArray` (instead of a `PtrArray`)
# and is 8 times slower with double the allocations.
#
# Note that `size` might contain `StaticInt`s, so convert to `Int` first.
return reshape(view(array, range), Int.(size))
end

function calculate_dt(v_ode, u_ode, cfl_number, semi::Semidiscretization)
Expand All @@ -409,7 +421,7 @@ function drift!(du_ode, v_ode, u_ode, semi, t)
du = wrap_u(du_ode, system, semi)
v = wrap_v(v_ode, system, semi)

@threaded for particle in each_moving_particle(system)
@threaded system for particle in each_moving_particle(system)
# This can be dispatched per system
add_velocity!(du, v, particle, system)
end
Expand Down Expand Up @@ -508,7 +520,7 @@ function add_source_terms!(dv_ode, v_ode, u_ode, semi)
v = wrap_v(v_ode, system, semi)
u = wrap_u(u_ode, system, semi)

@threaded for particle in each_moving_particle(system)
@threaded system for particle in each_moving_particle(system)
# Dispatch by system type to exclude boundary systems
add_acceleration!(dv, particle, system)
add_source_terms_inner!(dv, v, u, particle, system, source_terms(system))
Expand Down
25 changes: 25 additions & 0 deletions src/general/system.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
# Abstract supertype for all system types. We additionally store the type of the system's
# initial condition, which is `Nothing` when using KernelAbstractions.jl.
abstract type System{NDIMS, IC} end

# When using KernelAbstractions.jl, the initial condition has been replaced by `nothing`
GPUSystem = System{NDIMS, Nothing} where {NDIMS}

abstract type FluidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::FluidSystem) = "fluid"
vtkname(system::FluidSystem) = "fluid"

abstract type SolidSystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::SolidSystem) = "solid"
vtkname(system::SolidSystem) = "solid"

abstract type BoundarySystem{NDIMS, IC} <: System{NDIMS, IC} end
timer_name(::BoundarySystem) = "boundary"
vtkname(system::BoundarySystem) = "boundary"

@inline function set_zero!(du)
du .= zero(eltype(du))

return du
end

initialize!(system, neighborhood_search) = system

@inline Base.ndims(::System{NDIMS}) where {NDIMS} = NDIMS
Expand Down
7 changes: 4 additions & 3 deletions src/schemes/boundary/dummy_particles/dummy_particles.jl
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ function compute_pressure!(boundary_model, ::Union{SummationDensity, ContinuityD

# Limit pressure to be non-negative to avoid attractive forces between fluid and
# boundary particles at free surfaces (sticking artifacts).
@threaded for particle in eachparticle(system)
@threaded system for particle in eachparticle(system)
apply_state_equation!(boundary_model, particle_density(v, boundary_model,
particle), particle)
end
Expand Down Expand Up @@ -346,14 +346,15 @@ function compute_pressure!(boundary_model, ::AdamiPressureExtrapolation,
system_coords, neighbor_coords,
v_neighbor_system, nhs)
end
for particle in eachparticle(system)

@threaded system for particle in eachparticle(system)
# Limit pressure to be non-negative to avoid attractive forces between fluid and
# boundary particles at free surfaces (sticking artifacts).
pressure[particle] = max(pressure[particle], 0.0)
end
end

@trixi_timeit timer() "inverse state equation" @threaded for particle in eachparticle(system)
@trixi_timeit timer() "inverse state equation" @threaded system for particle in eachparticle(system)
compute_adami_density!(boundary_model, system, system_coords, particle)
end
end
Expand Down
4 changes: 2 additions & 2 deletions src/schemes/boundary/open_boundary/system.jl
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ end
reference_velocity, reference_pressure, reference_density) = system

# Update quantities based on the characteristic variables
@threaded for particle in each_moving_particle(system)
@threaded system for particle in each_moving_particle(system)
particle_position = current_coords(u, system, particle)

J1 = characteristics[1, particle]
Expand Down Expand Up @@ -230,7 +230,7 @@ function evaluate_characteristics!(system, v, u, v_ode, u_ode, semi, t)
# Thus, we compute the characteristics for the particles that are outside the influence
# of fluid particles by using the average of the values of the previous time step.
# See eq. 27 in Negi (2020) https://doi.org/10.1016/j.cma.2020.113119
@threaded for particle in each_moving_particle(system)
@threaded system for particle in each_moving_particle(system)

# Particle is outside of the influence of fluid particles
if isapprox(volume[particle], 0.0)
Expand Down
8 changes: 4 additions & 4 deletions src/schemes/boundary/system.jl
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ struct BoundarySPHSystem{BM, NDIMS, ELTYPE <: Real, IC, CO, M, IM,
# This constructor is necessary for Adapt.jl to work with this struct.
# See the comments in general/gpu.jl for more details.
function BoundarySPHSystem(initial_condition, coordinates, boundary_model, movement,
ismoving, adhesion_coefficient, cache)
ismoving, adhesion_coefficient, cache, buffer)
ELTYPE = eltype(coordinates)

new{typeof(boundary_model), size(coordinates, 1), ELTYPE, typeof(initial_condition),
typeof(coordinates), typeof(movement), typeof(ismoving),
typeof(cache)}(initial_condition, coordinates, boundary_model, movement,
ismoving, adhesion_coefficient, cache, nothing)
ismoving, adhesion_coefficient, cache, buffer)
end
end

Expand All @@ -54,7 +54,7 @@ function BoundarySPHSystem(initial_condition, model; movement=nothing,

# Because of dispatches boundary model needs to be first!
return BoundarySPHSystem(initial_condition, coordinates, model, movement,
ismoving, adhesion_coefficient, cache)
ismoving, adhesion_coefficient, cache, nothing)
end

"""
Expand Down Expand Up @@ -209,7 +209,7 @@ function (movement::BoundaryMovement)(system, t)

is_moving(t) || return system

@threaded for particle in moving_particles
@threaded system for particle in moving_particles
pos_new = initial_coords(system, particle) + movement_function(t)
vel = ForwardDiff.derivative(movement_function, t)
acc = ForwardDiff.derivative(t_ -> ForwardDiff.derivative(movement_function, t_), t)
Expand Down
2 changes: 1 addition & 1 deletion src/schemes/fluid/weakly_compressible_sph/system.jl
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ function reinit_density!(system, v, u, v_ode, u_ode, semi)
end

function compute_pressure!(system, v)
@threaded for particle in eachparticle(system)
@threaded system for particle in eachparticle(system)
apply_state_equation!(system, particle_density(v, system, particle), particle)
end
end
Expand Down
Loading