Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread Parallel Reduction for integrate_via_indices #2201

Conversation

DanielDoehring
Copy link
Contributor

@DanielDoehring DanielDoehring commented Dec 11, 2024

As suggested by @efaulhaber one can use the reduction operation/macro from Polyester.jl to speed up stuff like integrals (and errors; to come).

The larger the simulation, the larger the speedup. But even for something with relatively few elements (I used https://github.com/trixi-framework/Trixi.jl/blob/main/examples/p4est_2d_dgsem/elixir_euler_supersonic_cylinder.jl stopping at 2e-2 with element hierarchy

 #DOFs per field:         75648
 #elements:                4728
 ├── level 5:              1440
 ├── level 4:              1320
 ├── level 3:              1160
 ├── level 2:               193
 ├── level 1:               121
 └── level 0:               494

Using 2 threads to compute Trixi.analyze(Trixi.entropy_timederivative, u_wrap, u_wrap, tspan[2], mesh, equations, solver, semi.cache) (which is computed in every AnalysisCallback call by default) results in

BenchmarkTools.Trial: 6955 samples with 1 evaluation.
 Range (min  max):  608.541 μs    2.511 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     692.610 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   716.382 μs ± 107.030 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▁▃▅██▇▇▅▅▄▃▃▂▁▂▂ ▁                                       ▂
  ▂▅▅▆██████████████████████▇▆▆▇▇▇▇▇▇▆▆▆▅▆▅▆▅▄▂▅▃▅▆▃▄▄▂▅▃▃▄▅▃▅▆ █
  609 μs        Histogram: log(frequency) by time        1.1 ms <

 Memory estimate: 592 bytes, allocs estimate: 4.

compared to 1 thread:

1 Thread:

BenchmarkTools.Trial: 5175 samples with 1 evaluation.
Range (min  max):  850.861 μs   1.893 ms  ┊ GC (min  max): 0.00%  0.00%
Time  (median):     956.059 μs              ┊ GC (median):    0.00%
Time  (mean ± σ):   963.495 μs ± 38.375 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                           ▄█                                  
 ▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▂▃▃██▇▇▆▅▅▇▅▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂ ▃
 851 μs          Histogram: frequency by time         1.07 ms <

Memory estimate: 144 bytes, allocs estimate: 3.

@DanielDoehring DanielDoehring added performance We are greedy parallelization Related to MPI, threading, tasks etc. labels Dec 11, 2024
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.36%. Comparing base (f45455b) to head (6a7dfad).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2201      +/-   ##
==========================================
- Coverage   96.39%   96.36%   -0.03%     
==========================================
  Files         483      483              
  Lines       38349    38349              
==========================================
- Hits        36964    36952      -12     
- Misses       1385     1397      +12     
Flag Coverage Δ
unittests 96.36% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ranocha ranocha enabled auto-merge (squash) December 11, 2024 16:05
@ranocha ranocha merged commit 721624b into trixi-framework:main Dec 11, 2024
38 of 39 checks passed
@DanielDoehring DanielDoehring deleted the ThreadParallelReduction_IntegrateViaInd branch December 12, 2024 07:40
@vchuravy
Copy link
Member

In the context of #2029 we should probably not use macro batch directly, but rather route it through our own macro so that we can make sure that we can turn Polyester on and off.

@ranocha
Copy link
Member

ranocha commented Dec 17, 2024

How can we do multi-threaded reduction with Threads.@threads?

@efaulhaber
Copy link
Member

@carstenbauer suggested this package to me for multithreaded reduction: https://github.com/JuliaFolds2/OhMyThreads.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelization Related to MPI, threading, tasks etc. performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants