Skip to content

Commit

Permalink
Merge branch 'main' into NaNMath
Browse files Browse the repository at this point in the history
  • Loading branch information
ranocha committed Feb 6, 2024
2 parents 74c72cb + 14151e6 commit 65d1b6d
Show file tree
Hide file tree
Showing 8 changed files with 62 additions and 45 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/SpellCheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ jobs:
- name: Checkout Actions Repository
uses: actions/checkout@v4
- name: Check spelling
uses: crate-ci/typos@v1.16.26
uses: crate-ci/typos@v1.18.0
2 changes: 1 addition & 1 deletion .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
show-versioninfo: true
- uses: actions/cache@v3
- uses: actions/cache@v4
env:
cache-name: cache-artifacts
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ jobs:
- uses: julia-actions/julia-processcoverage@v1
with:
directories: src,examples,ext
- uses: codecov/codecov-action@v3
- uses: codecov/codecov-action@v4
with:
file: ./lcov.info
flags: unittests
Expand Down
12 changes: 9 additions & 3 deletions docs/literate/src/files/index.jl
Original file line number Diff line number Diff line change
Expand Up @@ -108,20 +108,26 @@
# software in the Trixi.jl ecosystem, and then run a simulation using Trixi.jl on said mesh.
# In the end, the tutorial briefly explains how to simulate an example using AMR via `P4estMesh`.

# ### [15 Explicit time stepping](@ref time_stepping)
# ### [15 P4est mesh from gmsh](@ref p4est_from_gmsh)
#-
# This tutorial describes how to obtain a [`P4estMesh`](@ref) from an existing mesh generated
# by [`gmsh`](https://gmsh.info/) or any other meshing software that can export to the Abaqus
# input `.inp` format. The tutorial demonstrates how edges/faces can be associated with boundary conditions based on the physical nodesets.

# ### [16 Explicit time stepping](@ref time_stepping)
#-
# This tutorial is about time integration using [OrdinaryDiffEq.jl](https://github.com/SciML/OrdinaryDiffEq.jl).
# It explains how to use their algorithms and presents two types of time step choices - with error-based
# and CFL-based adaptive step size control.

# ### [16 Differentiable programming](@ref differentiable_programming)
# ### [17 Differentiable programming](@ref differentiable_programming)
#-
# This part deals with some basic differentiable programming topics. For example, a Jacobian, its
# eigenvalues and a curve of total energy (through the simulation) are calculated and plotted for
# a few semidiscretizations. Moreover, we calculate an example for propagating errors with Measurement.jl
# at the end.

# ### [17 Custom semidiscretization](@ref custom_semidiscretization)
# ### [18 Custom semidiscretization](@ref custom_semidiscretization)
#-
# This tutorial describes the [semidiscretiations](@ref overview-semidiscretizations) of Trixi.jl
# and explains how to extend them for custom tasks.
Expand Down
11 changes: 11 additions & 0 deletions docs/src/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,3 +267,14 @@ requires. It can thus be seen as a proxy for "energy used" and, as an extension,
timing result, you need to set the analysis interval such that the
`AnalysisCallback` is invoked at least once during the course of the simulation and
discard the first PID value.

## Performance issues with multi-threaded reductions
[False sharing](https://en.wikipedia.org/wiki/False_sharing) is a known performance issue
for systems with distributed caches. It also occurred for the implementation of a thread
parallel bounds checking routine for the subcell IDP limiting
in [PR #1736](https://github.com/trixi-framework/Trixi.jl/pull/1736).
After some [testing and discussion](https://github.com/trixi-framework/Trixi.jl/pull/1736#discussion_r1423881895),
it turned out that initializing a vector of length `n * Threads.nthreads()` and only using every
n-th entry instead of a vector of length `Threads.nthreads()` fixes the problem.
Since there are no processors with caches over 128B, we use `n = 128B / size(uEltype)`.
Now, the bounds checking routine of the IDP limiting scales as hoped.
38 changes: 19 additions & 19 deletions src/meshes/t8code_mesh.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
T8codeMesh{NDIMS} <: AbstractMesh{NDIMS}
An unstructured curved mesh based on trees that uses the C library
An unstructured curved mesh based on trees that uses the C library
['t8code'](https://github.com/DLR-AMR/t8code)
to manage trees and mesh refinement.
"""
Expand Down Expand Up @@ -485,7 +485,7 @@ end
# form a family and we decide whether this family should be coarsened
# or only the first element should be refined.
# Otherwise `is_family` must equal zero and we consider the first entry
# of the element array for refinement.
# of the element array for refinement.
# Entries of the element array beyond the first `num_elements` are undefined.
# \param [in] forest the forest to which the new elements belong
# \param [in] forest_from the forest that is adapted.
Expand Down Expand Up @@ -542,8 +542,8 @@ Adapt a `T8codeMesh` according to a user-defined `adapt_callback`.
0 : Stay unchanged.
1 : Refine element.
- `kwargs`:
- `recursive = true`: Adapt the forest recursively. If true the caller must ensure that the callback
- `kwargs`:
- `recursive = true`: Adapt the forest recursively. If true the caller must ensure that the callback
returns 0 for every analyzed element at some point to stop the recursion.
- `balance = true`: Make sure the adapted forest is 2^(NDIMS-1):1 balanced.
- `partition = true`: Partition the forest to redistribute elements evenly among MPI ranks.
Expand Down Expand Up @@ -695,7 +695,7 @@ function count_interfaces(mesh::T8codeMesh)

for iface in 0:(num_faces - 1)
pelement_indices_ref = Ref{Ptr{t8_locidx_t}}()
pneighbor_leafs_ref = Ref{Ptr{Ptr{t8_element}}}()
pneighbor_leaves_ref = Ref{Ptr{Ptr{t8_element}}}()
pneigh_scheme_ref = Ref{Ptr{t8_eclass_scheme}}()

dual_faces_ref = Ref{Ptr{Cint}}()
Expand All @@ -704,7 +704,7 @@ function count_interfaces(mesh::T8codeMesh)
forest_is_balanced = Cint(1)

t8_forest_leaf_face_neighbors(mesh.forest, itree, element,
pneighbor_leafs_ref, iface, dual_faces_ref,
pneighbor_leaves_ref, iface, dual_faces_ref,
num_neighbors_ref,
pelement_indices_ref, pneigh_scheme_ref,
forest_is_balanced)
Expand All @@ -713,13 +713,13 @@ function count_interfaces(mesh::T8codeMesh)
dual_faces = unsafe_wrap(Array, dual_faces_ref[], num_neighbors)
neighbor_ielements = unsafe_wrap(Array, pelement_indices_ref[],
num_neighbors)
neighbor_leafs = unsafe_wrap(Array, pneighbor_leafs_ref[], num_neighbors)
neighbor_leaves = unsafe_wrap(Array, pneighbor_leaves_ref[], num_neighbors)
neighbor_scheme = pneigh_scheme_ref[]

if num_neighbors == 0
local_num_boundary += 1
else
neighbor_level = t8_element_level(neighbor_scheme, neighbor_leafs[1])
neighbor_level = t8_element_level(neighbor_scheme, neighbor_leaves[1])

if all(neighbor_ielements .< num_local_elements)
# Conforming interface: The second condition ensures we
Expand All @@ -745,7 +745,7 @@ function count_interfaces(mesh::T8codeMesh)
neighbor_linear_id = neighbor_global_ghost_itree *
max_tree_num_elements +
t8_element_get_linear_id(neighbor_scheme,
neighbor_leafs[1],
neighbor_leaves[1],
max_level)
global_mortar_id = 2 * ndims(mesh) * neighbor_linear_id +
dual_faces[1]
Expand All @@ -759,7 +759,7 @@ function count_interfaces(mesh::T8codeMesh)
end

t8_free(dual_faces_ref[])
t8_free(pneighbor_leafs_ref[])
t8_free(pneighbor_leaves_ref[])
t8_free(pelement_indices_ref[])
end # for

Expand Down Expand Up @@ -875,7 +875,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
end

pelement_indices_ref = Ref{Ptr{t8_locidx_t}}()
pneighbor_leafs_ref = Ref{Ptr{Ptr{t8_element}}}()
pneighbor_leaves_ref = Ref{Ptr{Ptr{t8_element}}}()
pneigh_scheme_ref = Ref{Ptr{t8_eclass_scheme}}()

dual_faces_ref = Ref{Ptr{Cint}}()
Expand All @@ -885,7 +885,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,

# Query neighbor information from t8code.
t8_forest_leaf_face_neighbors(mesh.forest, itree, element,
pneighbor_leafs_ref, iface, dual_faces_ref,
pneighbor_leaves_ref, iface, dual_faces_ref,
num_neighbors_ref,
pelement_indices_ref, pneigh_scheme_ref,
forest_is_balanced)
Expand All @@ -894,7 +894,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
dual_faces = unsafe_wrap(Array, dual_faces_ref[], num_neighbors)
neighbor_ielements = unsafe_wrap(Array, pelement_indices_ref[],
num_neighbors)
neighbor_leafs = unsafe_wrap(Array, pneighbor_leafs_ref[], num_neighbors)
neighbor_leaves = unsafe_wrap(Array, pneighbor_leaves_ref[], num_neighbors)
neighbor_scheme = pneigh_scheme_ref[]

# Now we check for the different cases. The nested if-structure is as follows:
Expand All @@ -913,7 +913,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
# else: // `local mortar from smaller elements point of view`
# <skip> // We only count local mortars once.
#
# else: // It must be either a MPI interface or a MPI mortar.
# else: // It must be either a MPI interface or a MPI mortar.
#
# if `MPI interface`:
# <fill MPI interface info>
Expand All @@ -938,7 +938,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,

# Interface or mortar.
else
neighbor_level = t8_element_level(neighbor_scheme, neighbor_leafs[1])
neighbor_level = t8_element_level(neighbor_scheme, neighbor_leaves[1])

# Local interface or mortar.
if all(neighbor_ielements .< num_local_elements)
Expand Down Expand Up @@ -985,7 +985,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
neighbor_linear_id = neighbor_global_ghost_itree *
max_tree_num_elements +
t8_element_get_linear_id(neighbor_scheme,
neighbor_leafs[1],
neighbor_leaves[1],
max_level)

if current_linear_id < neighbor_linear_id
Expand Down Expand Up @@ -1029,7 +1029,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
num_local_elements)
local_neighbor_ids = [neighbor_ids[i]
for i in local_neighbor_positions]
local_neighbor_positions = [map_iface_to_ichild_to_position[dual_faces[1] + 1][t8_element_child_id(neighbor_scheme, neighbor_leafs[i]) + 1]
local_neighbor_positions = [map_iface_to_ichild_to_position[dual_faces[1] + 1][t8_element_child_id(neighbor_scheme, neighbor_leaves[i]) + 1]
for i in local_neighbor_positions]

# Last entry is the large element.
Expand Down Expand Up @@ -1059,7 +1059,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
neighbor_linear_id = neighbor_global_ghost_itree *
max_tree_num_elements +
t8_element_get_linear_id(neighbor_scheme,
neighbor_leafs[1],
neighbor_leaves[1],
max_level)
global_mortar_id = 2 * ndims(mesh) * neighbor_linear_id +
dual_faces[1]
Expand Down Expand Up @@ -1100,7 +1100,7 @@ function fill_mesh_info!(mesh::T8codeMesh, interfaces, mortars, boundaries,
end

t8_free(dual_faces_ref[])
t8_free(pneighbor_leafs_ref[])
t8_free(pneighbor_leaves_ref[])
t8_free(pelement_indices_ref[])
end # for iface

Expand Down
12 changes: 6 additions & 6 deletions test/test_parabolic_2d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -218,9 +218,9 @@ end
"elixir_advection_diffusion.jl"),
tspan=(0.0, 0.0))
LLID = Trixi.local_leaf_cells(mesh.tree)
num_leafs = length(LLID)
@assert num_leafs % 8 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leafs / 8)])
num_leaves = length(LLID)
@assert num_leaves % 8 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leaves / 8)])
tspan = (0.0, 1.5)
semi = SemidiscretizationHyperbolicParabolic(mesh,
(equations, equations_parabolic),
Expand Down Expand Up @@ -414,9 +414,9 @@ end
"elixir_navierstokes_convergence.jl"),
tspan=(0.0, 0.0), initial_refinement_level=3)
LLID = Trixi.local_leaf_cells(mesh.tree)
num_leafs = length(LLID)
@assert num_leafs % 4 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leafs / 4)])
num_leaves = length(LLID)
@assert num_leaves % 4 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leaves / 4)])
tspan = (0.0, 0.5)
semi = SemidiscretizationHyperbolicParabolic(mesh, (equations, equations_parabolic),
initial_condition, solver;
Expand Down
28 changes: 14 additions & 14 deletions test/test_parabolic_3d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -252,9 +252,9 @@ end
"elixir_navierstokes_convergence.jl"),
tspan=(0.0, 0.0))
LLID = Trixi.local_leaf_cells(mesh.tree)
num_leafs = length(LLID)
@assert num_leafs % 16 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leafs / 16)])
num_leaves = length(LLID)
@assert num_leaves % 16 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leaves / 16)])
tspan = (0.0, 0.25)
semi = SemidiscretizationHyperbolicParabolic(mesh, (equations, equations_parabolic),
initial_condition, solver;
Expand Down Expand Up @@ -325,9 +325,9 @@ end
"elixir_navierstokes_taylor_green_vortex.jl"),
tspan=(0.0, 0.0))
LLID = Trixi.local_leaf_cells(mesh.tree)
num_leafs = length(LLID)
@assert num_leafs % 32 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leafs / 32)])
num_leaves = length(LLID)
@assert num_leaves % 32 == 0
Trixi.refine!(mesh.tree, LLID[1:Int(num_leaves / 32)])
tspan = (0.0, 0.1)
semi = SemidiscretizationHyperbolicParabolic(mesh, (equations, equations_parabolic),
initial_condition, solver)
Expand Down Expand Up @@ -429,8 +429,8 @@ end
"elixir_advection_diffusion_amr.jl"),
l2=[0.000355780485397024],
linf=[0.0010810770271614256])
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
let
t = sol.t[end]
u_ode = sol.u[end]
Expand All @@ -444,8 +444,8 @@ end
"elixir_advection_diffusion_nonperiodic.jl"),
l2=[0.0009808996243280868],
linf=[0.01732621559135459])
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
let
t = sol.t[end]
u_ode = sol.u[end]
Expand All @@ -472,8 +472,8 @@ end
0.12129218723807476,
0.8433893297612087,
])
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
let
t = sol.t[end]
u_ode = sol.u[end]
Expand All @@ -495,8 +495,8 @@ end
0.6782397526873181, 0.17663702154066238,
0.17663702154066266, 0.17663702154066238, 1.7327849844825238,
])
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
let
t = sol.t[end]
u_ode = sol.u[end]
Expand Down

0 comments on commit 65d1b6d

Please sign in to comment.