Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeMesh 2D simulation with MPI crashes when a rank has no boundaries #1870

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

benegee
Copy link
Contributor

@benegee benegee commented Mar 12, 2024

Example:

mpiexecjl -n 5 julia -e 'using Trixi;  trixi_include("../examples/tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl")'`

results in

#timesteps:     15 │ Δt: 6.9444e-02 │ sim. time: 1.0417e+00 (20.833%)  │ run time: 1.4942e+00 s
ERROR: LoadError: BoundsError: attempt to access 0-element Vector{Int64} at index [1]
Stacktrace:
  [1] getindex
    @ ./essentials.jl:13 [inlined]
  [2] macro expansion
    @ ~/trixi/Trixi.jl/src/solvers/dgsem_tree/dg_2d.jl:732 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/Polyester/HaBfT/src/closure.jl:443 [inlined]
  [4] macro expansion
    @ ~/trixi/Trixi.jl/src/auxiliary/auxiliary.jl:246 [inlined]
  [5] calc_boundary_flux_by_direction!(surface_flux_values::Array{Float64, 4}, ...
    @ Trixi ~/trixi/Trixi.jl/src/solvers/dgsem_tree/dg_2d.jl:730
...

Reason:

  • Because of an AMR step, one of the ranks gets only internal mesh cells.
    Screenshot_20240312_150624
    Screenshot_20240312_150641
  • reinitialize_containers! resizes the boundaries cache to size 0 on this rank
    resize!(boundaries, count_required_boundaries(mesh, leaf_cell_ids))
  • init_boundaries! is called but returns early
    # Exit early if there are no boundaries to initialize
    if nboundaries(boundaries) == 0
    return nothing
    end

    leaving cache.boundaries.n_boundaries_per_direction in a faulty state

Here is a straightforward fix, though I am not sure if this is the best place.

(The other changes are just for outputting the MPI rank of each cell. I can just delete them or move them to a separate PR)

Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@DanielDoehring DanielDoehring added bug Something isn't working parallelization Related to MPI, threading, tasks etc. labels Mar 12, 2024
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.30%. Comparing base (a528083) to head (4a94af4).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1870   +/-   ##
=======================================
  Coverage   96.30%   96.30%           
=======================================
  Files         439      439           
  Lines       35744    35745    +1     
=======================================
+ Hits        34423    34424    +1     
  Misses       1321     1321           
Flag Coverage Δ
unittests 96.30% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for debugging this!

@@ -32,6 +32,7 @@ mutable struct SerialTree{NDIMS} <: AbstractTree{NDIMS}
levels::Vector{Int}
coordinates::Matrix{Float64}
original_cell_ids::Vector{Int}
mpi_ranks::Vector{Int}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like introducing MPI-parallel data structures into the plain serial code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seemed strange to me as well. It was necessary because ParallelTree is used during the simulation whereas SerialTree is used when converting the output via Trixi2Vtk. So in order to get the MPI ranks in paraview I had to introduce this field here as well.

In general I find the mpi_ranks output very useful. Do you have an idea how to better implement it? Anyways, I will revert these changes here in this PR!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could specialize only the MPI-parallel case without changing the serial infrastructure?

src/solvers/dgsem_tree/containers_2d.jl Show resolved Hide resolved
src/meshes/mesh_io.jl Outdated Show resolved Hide resolved
@benegee benegee force-pushed the bg/fix-init-boundaries-for-0-boundaries branch from ca2169d to 738de75 Compare March 15, 2024 07:41
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ranocha ranocha merged commit aa9ea20 into main Mar 15, 2024
38 checks passed
@ranocha ranocha deleted the bg/fix-init-boundaries-for-0-boundaries branch March 15, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parallelization Related to MPI, threading, tasks etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants