Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-parallelize src term addition Euler Gravity #2102

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DanielDoehring
Copy link
Contributor

It ain't much, but it is honest work something.

2 Threads, eulergravity_sedov_blast_wave.jl with present implementation:

 ────────────────────────────────────────────────────────────────────────────────────────────────────
 Simulation running 'CompressibleEulerEquations2D' with DGSEM(polydeg=3)
────────────────────────────────────────────────────────────────────────────────────────────────────
 #timesteps:                586                run time:       2.94421216e+01 s
 Δt:             2.36742707e-03                └── GC time:    5.35440100e-02 s (0.182%)
 sim. time:      1.00000000e+00 (100.000%)     time/DOF/rhs!:  1.34115176e-07 s
                                               PID:            2.35086078e-07 s
 #DOFs per field:         59008                alloc'd memory:        427.173 MiB
 #elements:                3688
 ├── level 8:              2800
 ├── level 7:               580
 ├── level 6:               144
 ├── level 5:                92
 ├── level 4:                44
 ├── level 3:                24
 └── level 2:                 4

 Variable:       rho              rho_v1           rho_v2           rho_e         
 L2 error:       2.82838970e-01   9.56723845e-02   9.56723846e-02   5.07100358e-01
 Linf error:     3.88847701e+00   1.75078666e+00   1.75078668e+00   2.01765025e+01
 ∑∂S/∂U  Uₜ :  -6.27383203e-02
 ∑e_total    :   1.57946744e-02
 ∑e_kinetic  :   3.27940639e-03
 ∑e_internal :   1.25152680e-02
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0  Time steps: 586 (accepted), 586 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               29.7s /  93.3%           4.46GiB /  99.6%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 gravity solver                2.93k    10.6s   38.3%  3.62ms   65.3MiB    1.4%  22.8KiB
   rhs!                        14.7k    7.99s   28.8%   545μs   61.3MiB    1.3%  4.28KiB
     volume integral           14.7k    3.10s   11.2%   212μs   4.25MiB    0.1%     304B
     mortar flux               14.7k    1.35s    4.9%  91.8μs   6.26MiB    0.1%     448B
     surface integral          14.7k    810ms    2.9%  55.3μs   4.03MiB    0.1%     288B
     interface flux            14.7k    803ms    2.9%  54.8μs   4.70MiB    0.1%     336B
     source terms              14.7k    612ms    2.2%  41.8μs   4.03MiB    0.1%     288B
     prolong2interfaces        14.7k    394ms    1.4%  26.9μs   4.25MiB    0.1%     304B
     prolong2mortars           14.7k    315ms    1.1%  21.5μs   4.25MiB    0.1%     304B
     Jacobian                  14.7k    272ms    1.0%  18.5μs   3.35MiB    0.1%     240B
     reset ∂u/∂t               14.7k    231ms    0.8%  15.8μs     0.00B    0.0%    0.00B
     boundary flux             14.7k   71.2ms    0.3%  4.86μs   22.4MiB    0.5%  1.56KiB
     ~rhs!~                    14.7k   21.0ms    0.1%  1.43μs   9.33KiB    0.0%    0.65B
     prolong2boundaries        14.7k   12.5ms    0.0%   854ns   3.80MiB    0.1%     272B
   ~gravity solver~            2.93k    1.46s    5.3%   498μs   3.98MiB    0.1%  1.39KiB
   Runge-Kutta step            14.7k    1.15s    4.2%  78.5μs     0.00B    0.0%    0.00B
   calculate dt                2.93k   3.12ms    0.0%  1.06μs     0.00B    0.0%    0.00B
 AMR                             585    8.75s   31.6%  15.0ms   4.33GiB   97.4%  7.57MiB
   coarsen                       585    4.41s   15.9%  7.54ms   2.56GiB   57.7%  4.48MiB
     mesh                        585    2.98s   10.7%  5.09ms   5.00MiB    0.1%  8.76KiB
     solver                      585    668ms    2.4%  1.14ms   0.98GiB   22.1%  1.72MiB
     passive solver              585    538ms    1.9%   920μs    775MiB   17.0%  1.33MiB
     ~coarsen~                   585    224ms    0.8%   383μs    836MiB   18.4%  1.43MiB
   refine                        585    4.24s   15.3%  7.25ms   1.70GiB   38.2%  2.97MiB
     mesh                        507    3.08s   11.1%  6.07ms   7.95MiB    0.2%  16.1KiB
       refine_unbalanced!        507    2.99s   10.8%  5.89ms    331KiB    0.0%     668B
       rebalance!                629   80.6ms    0.3%   128μs   1.95MiB    0.0%  3.18KiB
       ~mesh~                    507   9.18ms    0.0%  18.1μs   5.68MiB    0.1%  11.5KiB
     solver                      507    620ms    2.2%  1.22ms   0.95GiB   21.4%  1.92MiB
     passive solver              507    539ms    1.9%  1.06ms    752MiB   16.5%  1.48MiB
     ~refine~                    585   3.11ms    0.0%  5.32μs    782KiB    0.0%  1.34KiB
   indicator                     585   78.7ms    0.3%   134μs   31.0MiB    0.7%  54.3KiB
   ~AMR~                         585   26.7ms    0.1%  45.6μs   40.3MiB    0.9%  70.5KiB
 Euler solver                  2.93k    7.96s   28.7%  2.72ms   17.0MiB    0.4%  5.93KiB
   rhs!                        2.93k    7.96s   28.7%  2.72ms   17.0MiB    0.4%  5.92KiB
     volume integral           2.93k    6.72s   24.3%  2.29ms   4.20MiB    0.1%  1.47KiB
       blended DG-FV           2.93k    5.38s   19.4%  1.84ms   1.25MiB    0.0%     448B
       pure DG                 2.93k    919ms    3.3%   314μs   1.25MiB    0.0%     448B
       blending factors        2.93k    361ms    1.3%   123μs   1.65MiB    0.0%     591B
       ~volume integral~       2.93k   62.5ms    0.2%  21.3μs   48.6KiB    0.0%    17.0B
     interface flux            2.93k    375ms    1.4%   128μs   1.12MiB    0.0%     400B
     mortar flux               2.93k    254ms    0.9%  86.5μs   1.61MiB    0.0%     576B
     surface integral          2.93k    197ms    0.7%  67.3μs    962KiB    0.0%     336B
     prolong2interfaces        2.93k    133ms    0.5%  45.5μs   1.03MiB    0.0%     368B
     Jacobian                  2.93k   87.6ms    0.3%  29.9μs    870KiB    0.0%     304B
     reset ∂u/∂t               2.93k   82.5ms    0.3%  28.1μs     0.00B    0.0%    0.00B
     prolong2mortars           2.93k   82.0ms    0.3%  28.0μs   1.25MiB    0.0%     448B
     boundary flux             2.93k   11.6ms    0.0%  3.94μs   5.01MiB    0.1%  1.75KiB
     ~rhs!~                    2.93k   6.89ms    0.0%  2.35μs   9.33KiB    0.0%    3.26B
     prolong2boundaries        2.93k   4.08ms    0.0%  1.39μs    962KiB    0.0%     336B
     source terms              2.93k   77.5μs    0.0%  26.5ns     0.00B    0.0%    0.00B
   ~Euler solver~              2.93k   1.19ms    0.0%   405ns      752B    0.0%    0.26B
 analyze solution                  7    250ms    0.9%  35.6ms   14.8MiB    0.3%  2.11MiB
 calculate dt                    587    120ms    0.4%   205μs     0.00B    0.0%    0.00B
 initial condition AMR             1   14.7ms    0.1%  14.7ms   21.4MiB    0.5%  21.4MiB
   AMR                             7   14.1ms    0.1%  2.02ms   21.4MiB    0.5%  3.06MiB
     refine                        7   13.3ms    0.0%  1.90ms   16.4MiB    0.4%  2.35MiB
       mesh                        6   10.0ms    0.0%  1.67ms    126KiB    0.0%  21.0KiB
         refine_unbalanced!        6   6.67ms    0.0%  1.11ms   4.73KiB    0.0%     808B
         rebalance!               14   3.29ms    0.0%   235μs   54.8KiB    0.0%  3.91KiB
         ~mesh~                    6   42.5μs    0.0%  7.08μs   66.5KiB    0.0%  11.1KiB
       solver                      6   1.65ms    0.0%   276μs   7.94MiB    0.2%  1.32MiB
       passive solver              6   1.62ms    0.0%   270μs   8.37MiB    0.2%  1.39MiB
       ~refine~                    7   16.1μs    0.0%  2.30μs   14.7KiB    0.0%  2.10KiB
     ~AMR~                         7    501μs    0.0%  71.6μs   4.86MiB    0.1%   710KiB
     indicator                     7    323μs    0.0%  46.1μs    113KiB    0.0%  16.2KiB
     coarsen                       7   1.03μs    0.0%   147ns      448B    0.0%    64.0B
   ~initial condition AMR~         1    551μs    0.0%   551μs   4.77KiB    0.0%  4.77KiB
 ───────────────────────────────────────────────────────────────────────────────────────

With thread-parallelization of source term addition:

────────────────────────────────────────────────────────────────────────────────────────────────────
 Simulation running 'CompressibleEulerEquations2D' with DGSEM(polydeg=3)
────────────────────────────────────────────────────────────────────────────────────────────────────
 #timesteps:                586                run time:       2.83292507e+01 s
 Δt:             2.36742707e-03                └── GC time:    5.74641740e-02 s (0.203%)
 sim. time:      1.00000000e+00 (100.000%)     time/DOF/rhs!:  1.24045952e-07 s
                                               PID:            2.26951851e-07 s
 #DOFs per field:         59008                alloc'd memory:        514.209 MiB
 #elements:                3688
 ├── level 8:              2800
 ├── level 7:               580
 ├── level 6:               144
 ├── level 5:                92
 ├── level 4:                44
 ├── level 3:                24
 └── level 2:                 4

 Variable:       rho              rho_v1           rho_v2           rho_e         
 L2 error:       2.82838970e-01   9.56723845e-02   9.56723846e-02   5.07100358e-01
 Linf error:     3.88847701e+00   1.75078666e+00   1.75078668e+00   2.01765025e+01
 ∑∂S/∂U  Uₜ :  -6.27383203e-02
 ∑e_total    :   1.57946744e-02
 ∑e_kinetic  :   3.27940639e-03
 ∑e_internal :   1.25152680e-02
────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi.jl simulation finished.  Final time: 1.0  Time steps: 586 (accepted), 586 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ───────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                        Time                    Allocations      
                                      ───────────────────────   ────────────────────────
           Tot / % measured:               28.7s /  94.5%           4.47GiB /  99.4%    

 Section                      ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────
 gravity solver                2.93k    9.82s   36.3%  3.35ms   65.3MiB    1.4%  22.8KiB
   rhs!                        14.7k    7.69s   28.4%   525μs   61.3MiB    1.3%  4.28KiB
     volume integral           14.7k    3.12s   11.5%   213μs   4.25MiB    0.1%     304B
     mortar flux               14.7k    1.11s    4.1%  75.6μs   6.26MiB    0.1%     448B
     interface flux            14.7k    806ms    3.0%  55.0μs   4.70MiB    0.1%     336B
     surface integral          14.7k    751ms    2.8%  51.3μs   4.03MiB    0.1%     288B
     source terms              14.7k    623ms    2.3%  42.5μs   4.03MiB    0.1%     288B
     prolong2interfaces        14.7k    397ms    1.5%  27.1μs   4.25MiB    0.1%     304B
     prolong2mortars           14.7k    321ms    1.2%  21.9μs   4.25MiB    0.1%     304B
     Jacobian                  14.7k    269ms    1.0%  18.3μs   3.35MiB    0.1%     240B
     reset ∂u/∂t               14.7k    232ms    0.9%  15.8μs     0.00B    0.0%    0.00B
     boundary flux             14.7k   30.5ms    0.1%  2.08μs   22.4MiB    0.5%  1.56KiB
     ~rhs!~                    14.7k   22.5ms    0.1%  1.53μs   9.33KiB    0.0%    0.65B
     prolong2boundaries        14.7k   12.8ms    0.0%   870ns   3.80MiB    0.1%     272B
   Runge-Kutta step            14.7k    1.18s    4.4%  80.4μs     0.00B    0.0%    0.00B
   ~gravity solver~            2.93k    946ms    3.5%   323μs   3.98MiB    0.1%  1.39KiB
   calculate dt                2.93k   3.07ms    0.0%  1.05μs     0.00B    0.0%    0.00B
 AMR                             585    8.79s   32.4%  15.0ms   4.33GiB   97.4%  7.57MiB
   coarsen                       585    4.43s   16.4%  7.58ms   2.56GiB   57.7%  4.48MiB
     mesh                        585    3.01s   11.1%  5.15ms   5.00MiB    0.1%  8.76KiB
     solver                      585    611ms    2.3%  1.04ms   0.98GiB   22.1%  1.72MiB
     passive solver              585    606ms    2.2%  1.04ms    775MiB   17.0%  1.33MiB
     ~coarsen~                   585    201ms    0.7%   343μs    836MiB   18.4%  1.43MiB
   refine                        585    4.24s   15.6%  7.25ms   1.70GiB   38.2%  2.97MiB
     mesh                        507    3.15s   11.6%  6.22ms   7.95MiB    0.2%  16.1KiB
       refine_unbalanced!        507    3.06s   11.3%  6.04ms    331KiB    0.0%     668B
       rebalance!                629   82.1ms    0.3%   131μs   1.95MiB    0.0%  3.18KiB
       ~mesh~                    507   9.46ms    0.0%  18.7μs   5.68MiB    0.1%  11.5KiB
     passive solver              507    548ms    2.0%  1.08ms    752MiB   16.5%  1.48MiB
     solver                      507    534ms    2.0%  1.05ms   0.95GiB   21.4%  1.92MiB
     ~refine~                    585   3.20ms    0.0%  5.47μs    782KiB    0.0%  1.34KiB
   indicator                     585   91.4ms    0.3%   156μs   31.0MiB    0.7%  54.3KiB
   ~AMR~                         585   27.1ms    0.1%  46.3μs   40.3MiB    0.9%  70.5KiB
 Euler solver                  2.93k    8.01s   29.6%  2.73ms   17.0MiB    0.4%  5.93KiB
   rhs!                        2.93k    8.01s   29.6%  2.73ms   17.0MiB    0.4%  5.92KiB
     volume integral           2.93k    6.78s   25.0%  2.31ms   4.20MiB    0.1%  1.47KiB
       blended DG-FV           2.93k    5.39s   19.9%  1.84ms   1.25MiB    0.0%     448B
       pure DG                 2.93k    965ms    3.6%   329μs   1.25MiB    0.0%     448B
       blending factors        2.93k    360ms    1.3%   123μs   1.65MiB    0.0%     591B
       ~volume integral~       2.93k   65.3ms    0.2%  22.3μs   48.6KiB    0.0%    17.0B
     interface flux            2.93k    376ms    1.4%   128μs   1.12MiB    0.0%     400B
     mortar flux               2.93k    251ms    0.9%  85.8μs   1.61MiB    0.0%     576B
     surface integral          2.93k    198ms    0.7%  67.5μs    962KiB    0.0%     336B
     prolong2interfaces        2.93k    136ms    0.5%  46.3μs   1.03MiB    0.0%     368B
     Jacobian                  2.93k   86.5ms    0.3%  29.5μs    870KiB    0.0%     304B
     prolong2mortars           2.93k   81.6ms    0.3%  27.9μs   1.25MiB    0.0%     448B
     reset ∂u/∂t               2.93k   80.0ms    0.3%  27.3μs     0.00B    0.0%    0.00B
     boundary flux             2.93k   11.4ms    0.0%  3.90μs   5.01MiB    0.1%  1.75KiB
     ~rhs!~                    2.93k   6.90ms    0.0%  2.35μs   9.33KiB    0.0%    3.26B
     prolong2boundaries        2.93k   4.04ms    0.0%  1.38μs    962KiB    0.0%     336B
     source terms              2.93k   92.5μs    0.0%  31.6ns     0.00B    0.0%    0.00B
   ~Euler solver~              2.93k   1.27ms    0.0%   433ns      752B    0.0%    0.26B
 analyze solution                  7    263ms    1.0%  37.6ms   14.8MiB    0.3%  2.11MiB
 calculate dt                    587    121ms    0.4%   206μs     0.00B    0.0%    0.00B
 initial condition AMR             1   84.6ms    0.3%  84.6ms   21.4MiB    0.5%  21.4MiB
   AMR                             7   84.0ms    0.3%  12.0ms   21.4MiB    0.5%  3.06MiB
     refine                        7   83.3ms    0.3%  11.9ms   16.4MiB    0.4%  2.35MiB
       solver                      6   71.8ms    0.3%  12.0ms   7.94MiB    0.2%  1.32MiB
       mesh                        6   9.88ms    0.0%  1.65ms    126KiB    0.0%  21.0KiB
         refine_unbalanced!        6   6.52ms    0.0%  1.09ms   4.73KiB    0.0%     808B
         rebalance!               14   3.32ms    0.0%   237μs   54.8KiB    0.0%  3.91KiB
         ~mesh~                    6   48.2μs    0.0%  8.03μs   66.5KiB    0.0%  11.1KiB
       passive solver              6   1.61ms    0.0%   268μs   8.37MiB    0.2%  1.39MiB
       ~refine~                    7   27.0μs    0.0%  3.86μs   14.7KiB    0.0%  2.10KiB
     ~AMR~                         7    474μs    0.0%  67.7μs   4.86MiB    0.1%   710KiB
     indicator                     7    235μs    0.0%  33.5μs    113KiB    0.0%  16.2KiB
     coarsen                       7   1.56μs    0.0%   222ns      448B    0.0%    64.0B
   ~initial condition AMR~         1    598μs    0.0%   598μs   4.77KiB    0.0%  4.77KiB
 ───────────────────────────────────────────────────────────────────────────────────────

@DanielDoehring DanielDoehring added the performance We are greedy label Oct 8, 2024
@DanielDoehring DanielDoehring requested a review from ranocha October 8, 2024 09:58
@DanielDoehring DanielDoehring changed the title Thread-parallelize src term computation Euler Gravity Thread-parallelize src term addtition Euler Gravity Oct 8, 2024
Copy link
Contributor

github-actions bot commented Oct 8, 2024

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@DanielDoehring DanielDoehring changed the title Thread-parallelize src term addtition Euler Gravity Thread-parallelize src term addition Euler Gravity Oct 8, 2024
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 79.16667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.44%. Comparing base (a151e74) to head (ab5b74e).

Files with missing lines Patch % Lines
...discretization/semidiscretization_euler_gravity.jl 79.17% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2102      +/-   ##
==========================================
+ Coverage   96.39%   96.44%   +0.05%     
==========================================
  Files         483      483              
  Lines       38349    38336      -13     
==========================================
+ Hits        36964    36972       +8     
+ Misses       1385     1364      -21     
Flag Coverage Δ
unittests 96.44% <79.17%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DanielDoehring
Copy link
Contributor Author

Hmm, interesting failures for the legacy threaded testsuite. Opinions on this?

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Could you, @sloede, please have a look as well?

src/semidiscretization/semidiscretization_euler_gravity.jl Outdated Show resolved Hide resolved
@ranocha
Copy link
Member

ranocha commented Oct 10, 2024

The test failures are indeed strange...

@torrilhon
Copy link
Contributor

Please note #1880 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants