-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-Parallelize blended DG-FV #2138
Thread-Parallelize blended DG-FV #2138
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2138 +/- ##
==========================================
+ Coverage 96.36% 96.37% +0.01%
==========================================
Files 477 477
Lines 37760 37720 -40
==========================================
- Hits 36385 36349 -36
+ Misses 1375 1371 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check how it scales with more threads?
Here are some measured runtimes for other number of threads. First is always the old version, second comes the proposed change. 1Old: ──────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 229s / 89.1% 59.8MiB / 4.8%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────
rhs! 3.32k 202s 98.8% 60.7ms 2.89MiB 100.0% 913B
volume integral 3.32k 138s 67.3% 41.4ms 2.88MiB 99.7% 910B
pure DG 3.32k 118s 58.0% 35.7ms 0.00B 0.0% 0.00B
blending factors 3.32k 12.4s 6.1% 3.74ms 1.00MiB 34.6% 316B
blended DG-FV 3.32k 5.92s 2.9% 1.78ms 0.00B 0.0% 0.00B
~volume integral~ 3.32k 756ms 0.4% 228μs 1.88MiB 65.1% 594B
interface flux 3.32k 22.9s 11.2% 6.90ms 0.00B 0.0% 0.00B
prolong2interfaces 3.32k 16.3s 8.0% 4.90ms 0.00B 0.0% 0.00B
surface integral 3.32k 13.0s 6.4% 3.93ms 0.00B 0.0% 0.00B
reset ∂u/∂t 3.32k 6.11s 3.0% 1.84ms 0.00B 0.0% 0.00B
Jacobian 3.32k 5.85s 2.9% 1.76ms 0.00B 0.0% 0.00B
~rhs!~ 3.32k 33.2ms 0.0% 10.0μs 9.33KiB 0.3% 2.88B
prolong2mortars 3.32k 1.58ms 0.0% 475ns 0.00B 0.0% 0.00B
prolong2boundaries 3.32k 1.15ms 0.0% 346ns 0.00B 0.0% 0.00B
mortar flux 3.32k 615μs 0.0% 185ns 0.00B 0.0% 0.00B
source terms 3.32k 125μs 0.0% 37.8ns 0.00B 0.0% 0.00B
boundary flux 3.32k 103μs 0.0% 31.1ns 0.00B 0.0% 0.00B
calculate dt 665 2.46s 1.2% 3.70ms 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────── New: ──────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 228s / 89.3% 57.9MiB / 1.7%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────
rhs! 3.32k 201s 98.8% 60.6ms 1.01MiB 100.0% 319B
volume integral 3.32k 138s 67.5% 41.4ms 1.00MiB 99.1% 316B
~volume integral~ 3.32k 125s 61.3% 37.6ms 752B 0.1% 0.23B
blending factors 3.32k 12.8s 6.3% 3.85ms 1.00MiB 99.0% 316B
interface flux 3.32k 22.9s 11.3% 6.90ms 0.00B 0.0% 0.00B
prolong2interfaces 3.32k 15.9s 7.8% 4.78ms 0.00B 0.0% 0.00B
surface integral 3.32k 12.7s 6.2% 3.82ms 0.00B 0.0% 0.00B
reset ∂u/∂t 3.32k 6.26s 3.1% 1.89ms 0.00B 0.0% 0.00B
Jacobian 3.32k 5.87s 2.9% 1.77ms 0.00B 0.0% 0.00B
~rhs!~ 3.32k 35.0ms 0.0% 10.5μs 9.33KiB 0.9% 2.88B
prolong2boundaries 3.32k 1.24ms 0.0% 374ns 0.00B 0.0% 0.00B
prolong2mortars 3.32k 1.14ms 0.0% 343ns 0.00B 0.0% 0.00B
mortar flux 3.32k 515μs 0.0% 155ns 0.00B 0.0% 0.00B
boundary flux 3.32k 396μs 0.0% 119ns 0.00B 0.0% 0.00B
source terms 3.32k 85.7μs 0.0% 25.8ns 0.00B 0.0% 0.00B
calculate dt 665 2.43s 1.2% 3.66ms 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────── 4Old: ──────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 95.7s / 71.8% 68.6MiB / 17.1%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────
rhs! 3.32k 65.9s 95.9% 19.8ms 11.8MiB 100.0% 3.63KiB
volume integral 3.32k 41.4s 60.2% 12.5ms 7.29MiB 62.0% 2.25KiB
pure DG 3.32k 33.4s 48.7% 10.1ms 1.42MiB 12.1% 448B
blending factors 3.32k 5.34s 7.8% 1.61ms 2.57MiB 21.9% 812B
blended DG-FV 3.32k 1.71s 2.5% 514μs 1.42MiB 12.1% 448B
~volume integral~ 3.32k 869ms 1.3% 262μs 1.88MiB 16.0% 594B
interface flux 3.32k 6.61s 9.6% 1.99ms 1.27MiB 10.8% 400B
surface integral 3.32k 6.12s 8.9% 1.84ms 1.06MiB 9.0% 336B
prolong2interfaces 3.32k 5.56s 8.1% 1.67ms 1.17MiB 9.9% 368B
Jacobian 3.32k 3.68s 5.4% 1.11ms 0.96MiB 8.2% 304B
reset ∂u/∂t 3.32k 2.52s 3.7% 759μs 0.00B 0.0% 0.00B
~rhs!~ 3.32k 32.6ms 0.0% 9.82μs 9.33KiB 0.1% 2.88B
prolong2boundaries 3.32k 1.60ms 0.0% 482ns 0.00B 0.0% 0.00B
prolong2mortars 3.32k 1.12ms 0.0% 337ns 0.00B 0.0% 0.00B
mortar flux 3.32k 637μs 0.0% 192ns 0.00B 0.0% 0.00B
boundary flux 3.32k 234μs 0.0% 70.4ns 0.00B 0.0% 0.00B
source terms 3.32k 166μs 0.0% 49.9ns 0.00B 0.0% 0.00B
calculate dt 665 2.79s 4.1% 4.19ms 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────── New: ──────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 94.9s / 72.1% 65.2MiB / 12.7%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────
rhs! 3.32k 65.6s 96.0% 19.8ms 8.31MiB 100.0% 2.56KiB
volume integral 3.32k 41.0s 60.0% 12.3ms 3.84MiB 46.2% 1.18KiB
~volume integral~ 3.32k 35.8s 52.3% 10.8ms 1.42MiB 17.1% 448B
blending factors 3.32k 5.24s 7.7% 1.58ms 2.42MiB 29.1% 764B
interface flux 3.32k 6.63s 9.7% 2.00ms 1.27MiB 15.3% 400B
surface integral 3.32k 6.15s 9.0% 1.85ms 1.06MiB 12.8% 336B
prolong2interfaces 3.32k 5.57s 8.1% 1.68ms 1.17MiB 14.0% 368B
Jacobian 3.32k 3.70s 5.4% 1.11ms 0.96MiB 11.6% 304B
reset ∂u/∂t 3.32k 2.52s 3.7% 760μs 0.00B 0.0% 0.00B
~rhs!~ 3.32k 38.0ms 0.1% 11.4μs 9.33KiB 0.1% 2.88B
prolong2boundaries 3.32k 1.73ms 0.0% 519ns 0.00B 0.0% 0.00B
prolong2mortars 3.32k 1.51ms 0.0% 455ns 0.00B 0.0% 0.00B
mortar flux 3.32k 502μs 0.0% 151ns 0.00B 0.0% 0.00B
boundary flux 3.32k 259μs 0.0% 78.0ns 0.00B 0.0% 0.00B
source terms 3.32k 170μs 0.0% 51.2ns 0.00B 0.0% 0.00B
calculate dt 665 2.75s 4.0% 4.14ms 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────── 8Old: ──────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 116s / 72.1% 68.6MiB / 17.1%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────
rhs! 3.32k 77.6s 92.6% 23.4ms 11.8MiB 100.0% 3.63KiB
volume integral 3.32k 54.2s 64.6% 16.3ms 7.29MiB 62.0% 2.25KiB
pure DG 3.32k 45.4s 54.1% 13.7ms 1.42MiB 12.1% 448B
blending factors 3.32k 5.45s 6.5% 1.64ms 2.57MiB 21.9% 812B
blended DG-FV 3.32k 1.98s 2.4% 596μs 1.42MiB 12.1% 448B
~volume integral~ 3.32k 1.37s 1.6% 412μs 1.88MiB 16.0% 594B
interface flux 3.32k 7.75s 9.2% 2.33ms 1.27MiB 10.8% 400B
surface integral 3.32k 5.31s 6.3% 1.60ms 1.06MiB 9.0% 336B
prolong2interfaces 3.32k 5.17s 6.2% 1.56ms 1.17MiB 9.9% 368B
Jacobian 3.32k 2.75s 3.3% 827μs 0.96MiB 8.2% 304B
reset ∂u/∂t 3.32k 2.40s 2.9% 722μs 0.00B 0.0% 0.00B
~rhs!~ 3.32k 60.0ms 0.1% 18.1μs 9.33KiB 0.1% 2.88B
prolong2boundaries 3.32k 1.99ms 0.0% 600ns 0.00B 0.0% 0.00B
prolong2mortars 3.32k 1.88ms 0.0% 567ns 0.00B 0.0% 0.00B
mortar flux 3.32k 1.47ms 0.0% 443ns 0.00B 0.0% 0.00B
boundary flux 3.32k 193μs 0.0% 58.2ns 0.00B 0.0% 0.00B
source terms 3.32k 145μs 0.0% 43.6ns 0.00B 0.0% 0.00B
calculate dt 665 6.20s 7.4% 9.32ms 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────── New: ──────────────────────────────────────────────────────────────────────────────────
Trixi.jl Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 109s / 73.4% 65.2MiB / 12.7%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────
rhs! 3.32k 76.0s 94.8% 22.9ms 8.31MiB 100.0% 2.56KiB
volume integral 3.32k 52.7s 65.7% 15.9ms 3.84MiB 46.2% 1.18KiB
~volume integral~ 3.32k 47.4s 59.1% 14.3ms 1.42MiB 17.1% 448B
blending factors 3.32k 5.33s 6.7% 1.60ms 2.42MiB 29.1% 764B
interface flux 3.32k 7.78s 9.7% 2.34ms 1.27MiB 15.3% 400B
surface integral 3.32k 5.32s 6.6% 1.60ms 1.06MiB 12.8% 336B
prolong2interfaces 3.32k 5.19s 6.5% 1.56ms 1.17MiB 14.0% 368B
Jacobian 3.32k 2.69s 3.4% 811μs 0.96MiB 11.6% 304B
reset ∂u/∂t 3.32k 2.27s 2.8% 683μs 0.00B 0.0% 0.00B
~rhs!~ 3.32k 52.2ms 0.1% 15.7μs 9.33KiB 0.1% 2.88B
prolong2mortars 3.32k 1.89ms 0.0% 570ns 0.00B 0.0% 0.00B
prolong2boundaries 3.32k 1.63ms 0.0% 489ns 0.00B 0.0% 0.00B
mortar flux 3.32k 1.04ms 0.0% 312ns 0.00B 0.0% 0.00B
boundary flux 3.32k 331μs 0.0% 100ns 0.00B 0.0% 0.00B
source terms 3.32k 161μs 0.0% 48.6ns 0.00B 0.0% 0.00B
calculate dt 665 4.14s 5.2% 6.23ms 0.00B 0.0% 0.00B
────────────────────────────────────────────────────────────────────────────────── |
Co-authored-by: Hendrik Ranocha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Code change cc @huiyuxie |
The current version of determining whether the DG-FV stabilization should be used for an
element
is not thread-parallelized. This is disadvantageous when considering scaling to large machines. The proposed implementation is thread-parallelized and reduces allocations.Example for https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_2d_dgsem/elixir_euler_sedov_blast_wave.jl without AMR on$2^8 \times 2^8$ mesh with 2 threads:
Old version:
New version: