Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase type stab, avoid allocs #1642

Merged

Conversation

DanielDoehring
Copy link
Contributor

Similar to #1635 and the corresponding fix #1636 I noticed in recent test runs (e.g. https://github.com/trixi-framework/Trixi.jl/actions/runs/6219235479/job/16876951926?pr=1629#step:7:4212) allocations in calc_boundary_flux!.

For https://github.com/trixi-framework/Trixi.jl/blob/main/examples/unstructured_2d_dgsem/elixir_acoustics_gauss_wall.jl I get for the current version on main

 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              9.93s /  77.7%           3.68GiB /  99.5%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       13.2k    7.28s   94.4%   552μs   3.61GiB   98.7%   287KiB
   volume integral          13.2k    2.88s   37.4%   219μs     0.00B    0.0%    0.00B
   interface flux           13.2k    1.14s   14.8%  86.3μs     0.00B    0.0%    0.00B
   boundary flux            13.2k    1.11s   14.4%  84.3μs   3.61GiB   98.7%   287KiB
   prolong2interfaces       13.2k    758ms    9.8%  57.5μs     0.00B    0.0%    0.00B
   surface integral         13.2k    716ms    9.3%  54.3μs     0.00B    0.0%    0.00B
   Jacobian                 13.2k    369ms    4.8%  28.0μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              13.2k    227ms    2.9%  17.2μs     0.00B    0.0%    0.00B
   prolong2boundaries       13.2k   53.5ms    0.7%  4.06μs     0.00B    0.0%    0.00B
   ~rhs!~                   13.2k   18.5ms    0.2%  1.40μs   6.61KiB    0.0%    0.51B
   source terms             13.2k    204μs    0.0%  15.4ns     0.00B    0.0%    0.00B
 I/O                           32    298ms    3.9%  9.31ms   37.8MiB    1.0%  1.18MiB
   save solution               31    191ms    2.5%  6.17ms   37.6MiB    1.0%  1.21MiB
   ~I/O~                       32    107ms    1.4%  3.33ms    249KiB    0.0%  7.77KiB
   get element variables       31   41.0μs    0.0%  1.32μs   6.78KiB    0.0%     224B
   save mesh                   31   10.8μs    0.0%   347ns     0.00B    0.0%    0.00B
 analyze solution              16    135ms    1.7%  8.42ms   10.9MiB    0.3%   695KiB
 ────────────────────────────────────────────────────────────────────────────────────

With the proposed changes

 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              8.93s /  76.7%           61.1MiB /  72.1%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                       13.2k    6.42s   93.6%   486μs   6.61KiB    0.0%    0.51B
   volume integral          13.2k    2.91s   42.5%   221μs     0.00B    0.0%    0.00B
   interface flux           13.2k    1.15s   16.8%  87.3μs     0.00B    0.0%    0.00B
   prolong2interfaces       13.2k    770ms   11.2%  58.4μs     0.00B    0.0%    0.00B
   surface integral         13.2k    708ms   10.3%  53.7μs     0.00B    0.0%    0.00B
   Jacobian                 13.2k    356ms    5.2%  27.0μs     0.00B    0.0%    0.00B
   reset ∂u/∂t              13.2k    228ms    3.3%  17.3μs     0.00B    0.0%    0.00B
   boundary flux            13.2k    217ms    3.2%  16.5μs     0.00B    0.0%    0.00B
   prolong2boundaries       13.2k   55.3ms    0.8%  4.20μs     0.00B    0.0%    0.00B
   ~rhs!~                   13.2k   18.6ms    0.3%  1.41μs   6.61KiB    0.0%    0.51B
   source terms             13.2k    282μs    0.0%  21.4ns     0.00B    0.0%    0.00B
 I/O                           32    294ms    4.3%  9.18ms   37.8MiB   85.8%  1.18MiB
   save solution               31    178ms    2.6%  5.76ms   37.6MiB   85.3%  1.21MiB
   ~I/O~                       32    115ms    1.7%  3.60ms    249KiB    0.6%  7.77KiB
   get element variables       31   40.3μs    0.0%  1.30μs   6.78KiB    0.0%     224B
   save mesh                   31   11.9μs    0.0%   385ns     0.00B    0.0%    0.00B
 analyze solution              16    145ms    2.1%  9.06ms   6.24MiB   14.2%   400KiB
 ────────────────────────────────────────────────────────────────────────────────────

@github-actions
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@DanielDoehring
Copy link
Contributor Author

DanielDoehring commented Sep 18, 2023

Furthermore, it looks like

@test_broken (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000
could actually be changed to

@test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000

thus rendering the if-clause

if (Threads.nthreads() < 2) || (VERSION < v"1.9")
@test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000
else
@test_broken (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000
end

obsolete.

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (73384ac) 96.11% compared to head (2e626cd) 96.11%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1642   +/-   ##
=======================================
  Coverage   96.11%   96.11%           
=======================================
  Files         418      418           
  Lines       34247    34247           
=======================================
  Hits        32915    32915           
  Misses       1332     1332           
Flag Coverage Δ
unittests 96.11% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/solvers/dgsem_unstructured/dg_2d.jl 96.71% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/solvers/dgsem_unstructured/dg_2d.jl Outdated Show resolved Hide resolved
src/solvers/dgsem_unstructured/dg_2d.jl Outdated Show resolved Hide resolved
@ranocha
Copy link
Member

ranocha commented Sep 18, 2023

Furthermore, it looks like

@test_broken (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000

could actually be changed to

@test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000

thus rendering the if-clause

if (Threads.nthreads() < 2) || (VERSION < v"1.9")
@test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000
else
@test_broken (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 5000
end

obsolete.

Could you please verify this locally (on Julia v1.8 and v1.9, serial and with multiple threads) and update the tests accordingly?

@DanielDoehring DanielDoehring added the performance We are greedy label Sep 18, 2023
@DanielDoehring
Copy link
Contributor Author

Could you please verify this locally (on Julia v1.8 and v1.9, serial and with multiple threads) and update the tests accordingly?

On Ubuntu 23.04 this seems to work indeed for the four mentioned setups.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ranocha ranocha merged commit 7e22898 into trixi-framework:main Sep 20, 2023
31 checks passed
@DanielDoehring DanielDoehring deleted the TypeStabilityBCsUnstructured branch September 20, 2023 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants