Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix allocs #1695

Merged
merged 7 commits into from
Oct 31, 2023
Merged

Conversation

DanielDoehring
Copy link
Contributor

This fixes #1678 .

The comment in the code suggests that loops where sometime avoided due to allocations and the recursive procedure was used - seems like things have changed.

# "lispy tuple programming" instead of for loop for type stability

Compare Summary CB:

 ─────────────────────────────────────────────────────────────────────────────────
            Trixi.jl                     Time                    Allocations      
                                ───────────────────────   ────────────────────────
        Tot / % measured:            1.34s /  13.5%            102MiB /   3.8%    

 Section                ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 rhs!                    1.90k   97.9ms   54.4%  51.5μs   5.88KiB    0.1%    3.17B
   volume integral       1.90k   84.7ms   47.1%  44.6μs     0.00B    0.0%    0.00B
   source terms          1.90k   5.13ms    2.9%  2.70μs     0.00B    0.0%    0.00B
   interface flux        1.90k   3.06ms    1.7%  1.61μs     0.00B    0.0%    0.00B
   boundary flux         1.90k   2.40ms    1.3%  1.26μs     0.00B    0.0%    0.00B
   ~rhs!~                1.90k    977μs    0.5%   514ns   5.88KiB    0.1%    3.17B
   Jacobian              1.90k    811μs    0.5%   427ns     0.00B    0.0%    0.00B
   prolong2interfaces    1.90k    310μs    0.2%   163ns     0.00B    0.0%    0.00B
   surface integral      1.90k    292μs    0.2%   153ns     0.00B    0.0%    0.00B
   reset ∂u/∂t           1.90k    196μs    0.1%   103ns     0.00B    0.0%    0.00B
 analyze solution            4   82.0ms   45.6%  20.5ms   3.88MiB   99.9%  0.97MiB
 ─────────────────────────────────────────────────────────────────────────────────

to

https://github.com/trixi-framework/Trixi.jl/actions/runs/6574426931/job/17859451194#step:7:8803

@DanielDoehring DanielDoehring requested a review from jlchan October 29, 2023 15:21
@github-actions
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@DanielDoehring DanielDoehring added the performance We are greedy label Oct 29, 2023
@codecov
Copy link

codecov bot commented Oct 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (61c33b0) 94.25% compared to head (caac09d) 87.02%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1695      +/-   ##
==========================================
- Coverage   94.25%   87.02%   -7.23%     
==========================================
  Files         431      431              
  Lines       34690    34639      -51     
==========================================
- Hits        32694    30143    -2551     
- Misses       1996     4496    +2500     
Flag Coverage Δ
unittests 87.02% <100.00%> (-7.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/solvers/dgmulti/dg.jl 75.76% <100.00%> (-18.56%) ⬇️

... and 78 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ranocha
Copy link
Member

ranocha commented Oct 30, 2023

Thanks for tackling this! Before merging it, I would like to understand what's going on here.

  • What's the type of the boundary_conditions?
  • What are the results on Julia v1.8 or something like that?
  • Can we also make it work with something like foreach that should be implemented in type-stable way?

@DanielDoehring
Copy link
Contributor Author

Type is

 NamedTuple{(:Slant, :Bezier, :Right, :Bottom, :Top), NTuple{5, BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}}} 

For julia 1.8.5 the results are the same: The proposed version does not allocate, while the existing one does.

In terms of using foreach,

foreach(boundary_conditions) do bc
   foo()
end   

loops only over the NTuple, i.e., gives bcs of type BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}.

But we also need the key, not sure if this can be done efficiently then.

@ranocha
Copy link
Member

ranocha commented Oct 30, 2023

I think you are lucky since this uses a homogeneous tuple under the hood. Could you please check a heterogeneous tuple, too (different types of BCs)? I expect your for loop will be type-unstable in this case.

@ranocha
Copy link
Member

ranocha commented Oct 30, 2023

Type is

 NamedTuple{(:Slant, :Bezier, :Right, :Bottom, :Top), NTuple{5, BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}}} 

For julia 1.8.5 the results are the same: The proposed version does not allocate, while the existing one does.

In terms of using foreach,

foreach(boundary_conditions) do bc
   foo()
end   

loops only over the NTuple, i.e., gives bcs of type BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}.

But we also need the key, not sure if this can be done efficiently then.

What about

foreach(boundary_conditions, keys(boundary_conditions)) do bc, name

?

Copy link
Contributor

@jlchan jlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this! I believe the original idea for "lispy tuple programming" came from Tim Holy (see for example https://stackoverflow.com/questions/55840333/type-stability-for-lists-of-closures/55849398#55849398) as a workaround for type stability when iterating over lists of functions. If iteration over NamedTuples is now type stable that would be great.

Is the original MWE in the StackExchange post still type unstable on Julia 1.8.5 (or 1.9.3)?

@DanielDoehring
Copy link
Contributor Author

I think you are lucky since this uses a homogeneous tuple under the hood. Could you please check a heterogeneous tuple, too (different types of BCs)? I expect your for loop will be type-unstable in this case.

So for

boundary_conditions = (; :Slant  => boundary_condition_convergence_test,
                         :Bezier => boundary_condition_convergence_test,
                         :Right  => boundary_condition_convergence_test,
                         #:Bottom => boundary_condition_convergence_test,
                         :Bottom => boundary_condition_do_nothing,
                         :Top    => boundary_condition_convergence_test )

with typeof(boundary_conditions)

NamedTuple{(:Slant, :Bezier, :Right, :Bottom, :Top), Tuple{BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}, 
BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}, 
BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}, 
Trixi.BoundaryConditionDoNothing,
 BoundaryConditionDirichlet{typeof(initial_condition_convergence_test)}}}

this is still not allocating (had to reduce runtime to avoid divergence of the simulation):

 ─────────────────────────────────────────────────────────────────────────────────
            Trixi.jl                     Time                    Allocations      
                                ───────────────────────   ────────────────────────
        Tot / % measured:           88.3ms /  95.0%            160KiB /  51.4%    

 Section                ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 rhs!                    1.49k   83.1ms   99.1%  55.9μs   5.88KiB    7.1%    4.05B
   volume integral       1.49k   72.3ms   86.2%  48.7μs     0.00B    0.0%    0.00B
   source terms          1.49k   4.27ms    5.1%  2.87μs     0.00B    0.0%    0.00B
   interface flux        1.49k   2.45ms    2.9%  1.65μs     0.00B    0.0%    0.00B
   boundary flux         1.49k   1.83ms    2.2%  1.23μs     0.00B    0.0%    0.00B
   ~rhs!~                1.49k    897μs    1.1%   603ns   5.88KiB    7.1%    4.05B
   Jacobian              1.49k    698μs    0.8%   470ns     0.00B    0.0%    0.00B
   surface integral      1.49k    281μs    0.3%   189ns     0.00B    0.0%    0.00B
   prolong2interfaces    1.49k    246μs    0.3%   165ns     0.00B    0.0%    0.00B
   reset ∂u/∂t           1.49k    165μs    0.2%   111ns     0.00B    0.0%    0.00B
 analyze solution            3    758μs    0.9%   253μs   76.5KiB   92.9%  25.5KiB
 ─────────────────────────────────────────────────────────────────────────────────

@DanielDoehring
Copy link
Contributor Author

What about

foreach(boundary_conditions, keys(boundary_conditions)) do bc, name

?

That works 👍

@DanielDoehring
Copy link
Contributor Author

Is the original MWE in the StackExchange post still type unstable on Julia 1.8.5 (or 1.9.3)?

Not sure:

Body::Vector{Float64}
1 ─ %1  = Main.eltype(u)::Core.Const(Float64)
│   %2  = Main.length(u)::Int64
│         (ret = Main.zeros(%1, %2))
│   %4  = Main.functions::Any
│         (@_3 = Base.iterate(%4))
│   %6  = (@_3 === nothing)::Bool
│   %7  = Base.not_int(%6)::Bool
└──       goto #4 if not %7
2 ┄ %9  = @_3::Any
│         (func = Core.getfield(%9, 1))
│   %11 = Core.getfield(%9, 2)::Any
│   %12 = ret::Vector{Float64}
│   %13 = Main.:+::Core.Const(+)
│   %14 = ret::Vector{Float64}
│   %15 = (func)(u)::Any
│   %16 = Base.broadcasted(%13, %14, %15)::Any
│         Base.materialize!(%12, %16)
│         (@_3 = Base.iterate(%4, %11))
│   %19 = (@_3 === nothing)::Bool
│   %20 = Base.not_int(%19)::Bool
└──       goto #4 if not %20
3 ─       goto #2
4 ┄       return ret

@jlchan
Copy link
Contributor

jlchan commented Oct 30, 2023

That still looks type unstable (note the presence of Any type variables, that means Julia cannot infer the type properly). However, if you are not observing allocations, maybe things have been partially fixed for our example?

@jlchan
Copy link
Contributor

jlchan commented Oct 30, 2023

Looking more closely, the StackExchange MWE isn't really the same as our boundary condition setting. They are looping over functions and directly applying them. Trixi is passing functions as arguments to another function, so there might be a function barrier helping us?

If you are not seeing allocations, and if this PR passes the allocation tests, I am ok with it.

@DanielDoehring
Copy link
Contributor Author

What about

foreach(boundary_conditions, keys(boundary_conditions)) do bc, name

?

Is this expected to be more robust than for (key, value) in zip(keys(boundary_conditions), boundary_conditions) @ranocha ? In that case I will adjust the PR accordingly.

@ranocha
Copy link
Member

ranocha commented Oct 31, 2023

If it works, let's use your approach. I woudl just like to understand why it didn't work before 😅

@ranocha ranocha merged commit 28500ea into trixi-framework:main Oct 31, 2023
30 of 31 checks passed
@DanielDoehring DanielDoehring deleted the FixAllocs_BC_DGMulti branch October 31, 2023 15:05
bennibolm added a commit to bennibolm/Trixi.jl that referenced this pull request Nov 6, 2023
* Revise bounds check for MCL

* Rename `idp_bounds_delta` for MCL to `mcl_bounds_delta`

* Remove comment

* Fix allocs (trixi-framework#1695)

* Fix allocs

* remove unnecessary code

* rerun fmt

* format

* Allocation tests dgmulti 2d (trixi-framework#1698)

* HLLE CEE 2D3D NonCartesian Meshes (trixi-framework#1692)

* HLLE CEE 2D3D NonCartesian Meshes

* format

* hlle via hll

* format test

* format test

* format

* do not export hlle

* Correct test vals

* test values CI

* Update src/equations/compressible_euler_2d.jl

Co-authored-by: Hendrik Ranocha <[email protected]>

* Update src/equations/compressible_euler_1d.jl

Co-authored-by: Hendrik Ranocha <[email protected]>

* Update src/equations/compressible_euler_2d.jl

Co-authored-by: Hendrik Ranocha <[email protected]>

* Update src/equations/compressible_euler_3d.jl

Co-authored-by: Hendrik Ranocha <[email protected]>

* Update src/equations/compressible_euler_3d.jl

Co-authored-by: Hendrik Ranocha <[email protected]>

* apply suggestions

* additional sentence

* Fix typo

* typos

* correct test vals

---------

Co-authored-by: Hendrik Ranocha <[email protected]>

* Bump crate-ci/typos from 1.16.15 to 1.16.21 (trixi-framework#1700)

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.16.15 to 1.16.21.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.16.15...v1.16.21)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add NumFOCUS + ACTRIX to acknowledgments (trixi-framework#1697)

* Add NumFOCUS + ACTRIX to acknowledgments

* Try to avoid spaces

* Another try to avoid gaps between images

* Hopefully fix image alignment in docs

* Try new logo formats

* Use smaller DUBS logo and add DUBS funding statement

* Add markdown-based table for logos in docs

* Try another table approach

* Hopefully get a layout that finally *works*...

* Arrrrrrgggggghhhhh

* format examples (trixi-framework#1531)

* format examples

* check formatting of examples in CI

* update style guide

* fix weird formatting

* fix formatting of binary operators

* format again

* Update differentiable_programming.jl (trixi-framework#1704)

* Format subcell elixirs

* Add warning for missing bounds check for entropy limiter (MCL)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Daniel Doehring <[email protected]>
Co-authored-by: Hendrik Ranocha <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael Schlottke-Lakemper <[email protected]>
Co-authored-by: ArseniyKholod <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

calc_boundary_flux! is allocating for HOHQMesh and DGMulti
3 participants