WIP: Improve performance of subcell limiting for non-conservative systems #6

amrueda · 2023-10-16T16:10:58Z

This is an ugly hack to improve the performance in the computation of the subcell limiting formula.

We avoid a lot of multiplications and additions with 0. As a result, the computation of the DG staggered fluxes goes from 2.30μs to 1.76μs (time/DOF/rhs! goes from 2.61561070e-07 s to 2.37202532e-07 s). However, the non-conservative fluxes become less readable...

This PR requires that the non-conservative flux functions that are called by VolumeIntegralSubcellLimiting modify their arguments. So far, I have not followed the "exclamation mark convention" to allow multiple dispatch on volume_flux.

This is the new performance summary of the test referenced in trixi-framework#1670:

 ─────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                       Time                    Allocations      
                                    ───────────────────────   ────────────────────────
          Tot / % measured:              367ms /  98.2%           19.6MiB /  99.2%    

 Section                    ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────
 rhs!                          306    296ms   82.0%   966μs   10.1KiB    0.1%    33.7B
   volume integral             306    250ms   69.3%   818μs      752B    0.0%    2.46B
     calcflux_fhat!          78.3k    138ms   38.3%  1.76μs     0.00B    0.0%    0.00B
     ~volume integral~         306    112ms   31.1%   366μs      752B    0.0%    2.46B
   interface flux              306   34.6ms    9.6%   113μs     0.00B    0.0%    0.00B
   surface integral            306   4.04ms    1.1%  13.2μs     0.00B    0.0%    0.00B
   prolong2interfaces          306   3.81ms    1.1%  12.5μs     0.00B    0.0%    0.00B
   Jacobian                    306   1.64ms    0.5%  5.36μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                 306   1.04ms    0.3%  3.41μs     0.00B    0.0%    0.00B
   ~rhs!~                      306    326μs    0.1%  1.06μs   9.33KiB    0.0%    31.2B
   prolong2boundaries          306   18.5μs    0.0%  60.4ns     0.00B    0.0%    0.00B
   prolong2mortars             306   13.7μs    0.0%  44.8ns     0.00B    0.0%    0.00B
   mortar flux                 306   9.62μs    0.0%  31.4ns     0.00B    0.0%    0.00B
   boundary flux               306   6.73μs    0.0%  22.0ns     0.00B    0.0%    0.00B
   source terms                306   6.56μs    0.0%  21.4ns     0.00B    0.0%    0.00B
 I/O                            13   31.9ms    8.8%  2.45ms   10.5MiB   54.0%   828KiB
   save solution                12   30.8ms    8.5%  2.56ms   10.4MiB   53.2%   885KiB
   ~I/O~                        13    791μs    0.2%  60.8μs   40.8KiB    0.2%  3.14KiB
   get element variables        12    283μs    0.1%  23.6μs    107KiB    0.5%  8.89KiB
   get node variables           12   3.53μs    0.0%   294ns     0.00B    0.0%    0.00B
   save mesh                    12    510ns    0.0%  42.5ns     0.00B    0.0%    0.00B
 a posteriori limiter          306   17.0ms    4.7%  55.5μs   34.9KiB    0.2%     117B
   blending factors            306   10.9ms    3.0%  35.5μs   34.2KiB    0.2%     114B
     positivity                306   9.31ms    2.6%  30.4μs   33.5KiB    0.2%     112B
     ~blending factors~        306   1.57ms    0.4%  5.13μs      752B    0.0%    2.46B
   ~a posteriori limiter~      306   6.12ms    1.7%  20.0μs      752B    0.0%    2.46B
 analyze solution                3   8.22ms    2.3%  2.74ms   8.93MiB   45.8%  2.98MiB
 calculate dt                  103   7.96ms    2.2%  77.3μs     0.00B    0.0%    0.00B
 ─────────────────────────────────────────────────────────────────────────────────────

…imiting formula (we avoid to multiply and sum 0)

github-actions · 2023-10-16T16:11:15Z

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

The PR has a single goal that is clear from the PR title and/or description.
All code changes represent a single set of modifications that logically belong together.
No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

The code can be understood easily.
Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
There are no redundancies that can be removed by simple modularization/refactoring.
There are no leftover debug statements or commented code sections.
The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

New functions and types are documented with a docstring or top-level comment.
Relevant publications are referenced in docstrings (see example for formatting).
Inline comments are used to document longer or unusual code sections.
Comments describe intent ("why?") and not just functionality ("what?").
If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

The PR passes all tests.
New or modified lines of code are covered by tests.
New or modified tests run in less then 10 seconds.

Performance

There are no type instabilities or memory allocations in performance-critical parts.
If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

The correctness of the code was verified using appropriate tests.
If new equations/methods are added, a convergence test has been run and the results
are posted in the PR.

Created with ❤️ by the Trixi.jl community.

…_performance

Ugly hacks to improve performance in the computation of the subcell l…

8adc47d

…imiting formula (we avoid to multiply and sum 0)

Moved variable assignment out of loop

5a09237

amrueda changed the title ~~Improve performance of subcell limiting for non-conservative systems~~ WIP: Improve performance of subcell limiting for non-conservative systems Oct 17, 2023

amrueda added 3 commits October 23, 2023 10:56

Merge branch 'subcell_limiting_noncons' into subcell_limiting_noncons…

1656799

…_performance

Added debugging elixirs and timer

ec6a36d

format

bff41df

amrueda mentioned this pull request Oct 23, 2023

Implement subcell limiting for non-conservative systems trixi-framework/Trixi.jl#1670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Improve performance of subcell limiting for non-conservative systems #6

WIP: Improve performance of subcell limiting for non-conservative systems #6

amrueda commented Oct 16, 2023 •

edited

Loading

github-actions bot commented Oct 16, 2023

WIP: Improve performance of subcell limiting for non-conservative systems #6

Are you sure you want to change the base?

WIP: Improve performance of subcell limiting for non-conservative systems #6

Conversation

amrueda commented Oct 16, 2023 • edited Loading

github-actions bot commented Oct 16, 2023

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

amrueda commented Oct 16, 2023 •

edited

Loading