Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for active_cells_map in kernels #3920

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
279e4c4
support for MappedFunctions
simone-silvestri Nov 12, 2024
6182e34
remove double kernels everywhere
simone-silvestri Nov 12, 2024
5545605
make sure also GPU works
simone-silvestri Nov 12, 2024
e239dbe
make sure there is no return in the function
simone-silvestri Nov 12, 2024
32e0c52
return nothing
simone-silvestri Nov 12, 2024
0b56304
adapt for GPU usage
simone-silvestri Nov 12, 2024
052f56e
not sure about the GPU
simone-silvestri Nov 12, 2024
d0d56ac
hmmm
simone-silvestri Nov 12, 2024
7e41737
probably like this it will work on the GPU?
simone-silvestri Nov 12, 2024
42d4989
Merge branch 'ss/active-index-macro' of github.com:CliMA/Oceananigans…
simone-silvestri Nov 12, 2024
130a357
some cleanup plus testing a test?
simone-silvestri Nov 12, 2024
3be13dd
works on gpu, to fix the dynamic check
simone-silvestri Nov 12, 2024
64cad18
move lwargs
simone-silvestri Nov 13, 2024
333bfd7
kwarg management
simone-silvestri Nov 13, 2024
e069051
reduce time for tridiagonal solve
simone-silvestri Nov 13, 2024
f0a887a
see if tests pass (especially distributed where we have active_cells)
simone-silvestri Nov 13, 2024
7b33e7a
bugfix
simone-silvestri Nov 13, 2024
26194f3
grid from solver
simone-silvestri Nov 13, 2024
db84ef2
Merge branch 'main' into ss/active-index-macro
simone-silvestri Nov 14, 2024
0b70d31
this should work
simone-silvestri Nov 28, 2024
c620492
typo
simone-silvestri Nov 28, 2024
3d35878
extend surface map in halos
simone-silvestri Dec 10, 2024
4ff3f59
Merge remote-tracking branch 'origin/main' into ss/active-index-macro
simone-silvestri Dec 10, 2024
3609931
this should work?
simone-silvestri Dec 10, 2024
01b71c1
works maybe
simone-silvestri Dec 10, 2024
bfcf449
should speed up a lot
simone-silvestri Dec 10, 2024
e1ad82d
precompiles
simone-silvestri Dec 10, 2024
c8efdb1
fixes
simone-silvestri Dec 10, 2024
a1f09b4
more corrections
simone-silvestri Dec 10, 2024
1c043bc
probably it should work
simone-silvestri Dec 10, 2024
ee98cf2
more corrections
simone-silvestri Dec 10, 2024
1a15fc3
more changes
simone-silvestri Dec 10, 2024
c8469ba
try this for the moment
simone-silvestri Dec 10, 2024
4373ec1
remove all duplication
simone-silvestri Dec 10, 2024
9b3c161
remove all duplication
simone-silvestri Dec 10, 2024
1615c99
Merge remote-tracking branch 'origin/main' into ss/active-index-macro
simone-silvestri Dec 10, 2024
ae6f6e3
add more stuff
simone-silvestri Dec 10, 2024
ff80a8e
better
simone-silvestri Dec 10, 2024
e6c8c8c
Merge branch 'main' into ss/active-index-macro
simone-silvestri Dec 11, 2024
f9ae12e
try new test
simone-silvestri Dec 11, 2024
af143aa
This works, we optimize performance later
simone-silvestri Dec 11, 2024
e0b2b19
some housekeeping
simone-silvestri Dec 11, 2024
c9b19c3
good active cells map test
simone-silvestri Dec 11, 2024
ae0366d
fixed index launching
simone-silvestri Dec 11, 2024
b73ecbc
just test relevant tests
simone-silvestri Dec 11, 2024
d685345
change comment
simone-silvestri Dec 11, 2024
00eac79
better
simone-silvestri Dec 11, 2024
1ebb694
it works. Back to the complete test suite.
simone-silvestri Dec 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 56 additions & 35 deletions src/Utils/kernel_launching.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ using KernelAbstractions: Kernel

import Oceananigans
import KernelAbstractions: get, expand
import Base

struct KernelParameters{S, O} end

Expand Down Expand Up @@ -82,23 +83,12 @@ end
contiguousrange(range::NTuple{N, Int}, offset::NTuple{N, Int}) where N = Tuple(1+o:r+o for (r, o) in zip(range, offset))
flatten_reduced_dimensions(worksize, dims) = Tuple(d ∈ dims ? 1 : worksize[d] for d = 1:3)

####
#### Internal utility to launch a function mapped on an index_map
####

# Internal utility to launch a function mapped on an index_map
struct MappedFunction{F, M} <: Function
func::F
index_map::M
end

Adapt.adapt_structure(to, m::MappedFunction) =
MappedFunction(Adapt.adapt(to, m.func), Adapt.adapt(to, m.index_map))

@inline function (m::MappedFunction)(_ctx_, args...)
m.func(_ctx_, args...)
return nothing
end

# Support for 1D
heuristic_workgroup(Wx) = min(Wx, 256)

Expand Down Expand Up @@ -440,54 +430,85 @@ end
##### Utilities for Mapped kernels
#####

struct IndexMap{M}
index_map :: M
end

Adapt.adapt_structure(to, m::IndexMap) = IndexMap(Adapt.adapt(to, m.index_map))
struct IndexMap end

const MappedNDRange{N} = NDRange{N, <:StaticSize, <:StaticSize, <:Any, <:IndexMap} where N
const MappedNDRange{N} = NDRange{N, <:StaticSize, <:StaticSize, <:IndexMap, <:AbstractArray} where N

# NDRange has been modified to include an index_map in place of workitems.
# Remember, dynamic offset kernels are not possible with this extension!!
# Also, mapped kernels work only with a 1D kernel and a 1D map, it is not possible to launch a ND kernel.
# TODO: maybe don't do this
@inline function expand(ndrange::MappedNDRange, groupidx::CartesianIndex, idx::CartesianIndex)
Base.@_inline_meta
offsets = workitems(ndrange)
stride = size(offsets, 1)
gidx = groupidx.I[1]
tI = (gidx - 1) * stride + idx.I[1]
nI = ndrange.workitems.index_map[tI]
return CartesianIndex(nI)
@inline function expand(ndrange::MappedNDRange, groupidx::CartesianIndex{N}, idx::CartesianIndex{N}) where N
nI = ntuple(Val(N)) do I
Base.@_inline_meta
offsets = workitems(ndrange)
stride = size(offsets, I)
gidx = groupidx.I[I]
ndrange.workitems[(gidx - 1) * stride + idx.I[I]]
end
return CartesianIndex(nI...)
end

const MappedKernel{D} = Kernel{D, <:Any, <:Any, <:MappedFunction} where D

# Override the getproperty to make sure we get the correct properties
@inline getproperty(k::MappedKernel, prop::Symbol) = get_mapped_property(k, Val(prop))
# Override the getproperty to make sure we launch the correct function in the kernel
@inline Base.getproperty(k::MappedKernel, prop::Symbol) = get_mapped_kernel_property(k, Val(prop))

@inline get_mapped_kernel_property(k, ::Val{prop}) where prop = getfield(k, prop)
@inline get_mapped_kernel_property(k, ::Val{:index_map}) = getfield(getfield(k, :f), :index_map)
@inline get_mapped_kernel_property(k, ::Val{:f}) = getfield(getfield(k, :f), :func)

@inline get_mapped_property(k, ::Val{:index_map}) = k.f.index_map
@inline get_mapped_property(k, ::Val{:func}) = k.f.func
Adapt.adapt_structure(to, cm::CompilerMetadata{N, C}) where {N, C} =
CompilerMetadata{N, C}(Adapt.adapt(to, cm.groupindex),
Adapt.adapt(to, cm.ndrange),
Adapt.adapt(to, cm.iterspace))

Adapt.adapt_structure(to, ndrange::NDRange{N, B, W}) where {N, B, W} =
NDRange{N, B, W}(Adapt.adapt(to, ndrange.blocks), Adapt.adapt(to, ndrange.workitems))

# Extending the partition function to include offsets in NDRange: note that in this case the
# offsets take the place of the DynamicWorkitems which we assume is not needed in static kernels
function partition(kernel::MappedKernel, inrange, ingroupsize)
static_workgroupsize = workgroupsize(kernel)

# Calculate the static NDRange and WorkgroupSize
index_map = getproperty(kernel, :index_map)
index_map = kernel.index_map
range = length(index_map)
arch = Oceananigans.Architectures.architecture(index_map)
groupsize = get(static_workgroupsize)

blocks, groupsize, dynamic = NDIteration.partition(range, groupsize)

static_blocks = StaticSize{blocks}
static_workgroupsize = StaticSize{groupsize} # we might have padded workgroupsize

index_map = Oceananigans.Architectures.convert_args(arch, index_map)
iterspace = NDRange{length(range), static_blocks, static_workgroupsize}(blocks, IndexMap(index_map))
iterspace = NDRange{length(range), static_blocks, static_workgroupsize}(IndexMap(), index_map)

return iterspace, dynamic
end
end

using KernelAbstractions: CompilerMetadata, NoDynamicCheck
using CUDA: CUDABackend

import KernelAbstractions: mkcontext, __dynamic_checkbounds, __validindex

# Very dangerous override of mkcontext which will not work if we are not
# carefull with making sure that indices are correct when launching a `MappedKernel`
simone-silvestri marked this conversation as resolved.
Show resolved Hide resolved
# TODO: Definitely change this with options below
function mkcontext(kernel::Kernel{CUDABackend}, _ndrange, iterspace::MappedNDRange)
return CompilerMetadata{ndrange(kernel), NoDynamicCheck}(_ndrange, iterspace)
end

# Alternative to the above to fix:
# const MappedCompilerMetadata = CompilerMetadata{<:Any, <:Any, <:Any, <:Any, <:MappedNDRange}

# @inline __ndrange(cm::MappedCompilerMetadata) = cm.iterspace

# @inline function __validindex(ctx::MappedCompilerMetadata, idx::CartesianIndex)
# # Turns this into a noop for code where we can turn of checkbounds of
# if __dynamic_checkbounds(ctx)
# I = @inbounds expand(__iterspace(ctx), __groupindex(ctx), idx)
# return I in __ndrange(ctx)
# else
# return true
# end
# end