Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve performance of subcell limiting for non-conservative systems #6

Open
wants to merge 5 commits into
base: subcell_limiting_noncons
Choose a base branch
from

Conversation

amrueda
Copy link
Owner

@amrueda amrueda commented Oct 16, 2023

This is an ugly hack to improve the performance in the computation of the subcell limiting formula.

We avoid a lot of multiplications and additions with 0. As a result, the computation of the DG staggered fluxes goes from 2.30μs to 1.76μs (time/DOF/rhs! goes from 2.61561070e-07 s to 2.37202532e-07 s). However, the non-conservative fluxes become less readable...

This PR requires that the non-conservative flux functions that are called by VolumeIntegralSubcellLimiting modify their arguments. So far, I have not followed the "exclamation mark convention" to allow multiple dispatch on volume_flux.

This is the new performance summary of the test referenced in trixi-framework#1670:

 ─────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                       Time                    Allocations      
                                    ───────────────────────   ────────────────────────
          Tot / % measured:              367ms /  98.2%           19.6MiB /  99.2%    

 Section                    ncalls     time    %tot     avg     alloc    %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────
 rhs!                          306    296ms   82.0%   966μs   10.1KiB    0.1%    33.7B
   volume integral             306    250ms   69.3%   818μs      752B    0.0%    2.46B
     calcflux_fhat!          78.3k    138ms   38.3%  1.76μs     0.00B    0.0%    0.00B
     ~volume integral~         306    112ms   31.1%   366μs      752B    0.0%    2.46B
   interface flux              306   34.6ms    9.6%   113μs     0.00B    0.0%    0.00B
   surface integral            306   4.04ms    1.1%  13.2μs     0.00B    0.0%    0.00B
   prolong2interfaces          306   3.81ms    1.1%  12.5μs     0.00B    0.0%    0.00B
   Jacobian                    306   1.64ms    0.5%  5.36μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                 306   1.04ms    0.3%  3.41μs     0.00B    0.0%    0.00B
   ~rhs!~                      306    326μs    0.1%  1.06μs   9.33KiB    0.0%    31.2B
   prolong2boundaries          306   18.5μs    0.0%  60.4ns     0.00B    0.0%    0.00B
   prolong2mortars             306   13.7μs    0.0%  44.8ns     0.00B    0.0%    0.00B
   mortar flux                 306   9.62μs    0.0%  31.4ns     0.00B    0.0%    0.00B
   boundary flux               306   6.73μs    0.0%  22.0ns     0.00B    0.0%    0.00B
   source terms                306   6.56μs    0.0%  21.4ns     0.00B    0.0%    0.00B
 I/O                            13   31.9ms    8.8%  2.45ms   10.5MiB   54.0%   828KiB
   save solution                12   30.8ms    8.5%  2.56ms   10.4MiB   53.2%   885KiB
   ~I/O~                        13    791μs    0.2%  60.8μs   40.8KiB    0.2%  3.14KiB
   get element variables        12    283μs    0.1%  23.6μs    107KiB    0.5%  8.89KiB
   get node variables           12   3.53μs    0.0%   294ns     0.00B    0.0%    0.00B
   save mesh                    12    510ns    0.0%  42.5ns     0.00B    0.0%    0.00B
 a posteriori limiter          306   17.0ms    4.7%  55.5μs   34.9KiB    0.2%     117B
   blending factors            306   10.9ms    3.0%  35.5μs   34.2KiB    0.2%     114B
     positivity                306   9.31ms    2.6%  30.4μs   33.5KiB    0.2%     112B
     ~blending factors~        306   1.57ms    0.4%  5.13μs      752B    0.0%    2.46B
   ~a posteriori limiter~      306   6.12ms    1.7%  20.0μs      752B    0.0%    2.46B
 analyze solution                3   8.22ms    2.3%  2.74ms   8.93MiB   45.8%  2.98MiB
 calculate dt                  103   7.96ms    2.2%  77.3μs     0.00B    0.0%    0.00B
 ─────────────────────────────────────────────────────────────────────────────────────

…imiting formula (we avoid to multiply and sum 0)
@github-actions
Copy link

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@amrueda amrueda changed the title Improve performance of subcell limiting for non-conservative systems WIP: Improve performance of subcell limiting for non-conservative systems Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant