Skip to content

Commit

Permalink
Add section about false sharing problems to documentation (#1819)
Browse files Browse the repository at this point in the history
* Add section to docs about false sharing

* Fix typos

* Fix typo

* Implement suggestions
  • Loading branch information
bennibolm authored Feb 5, 2024
1 parent fa129aa commit ea61e26
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/src/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,3 +267,14 @@ requires. It can thus be seen as a proxy for "energy used" and, as an extension,
timing result, you need to set the analysis interval such that the
`AnalysisCallback` is invoked at least once during the course of the simulation and
discard the first PID value.

## Performance issues with multi-threaded reductions
[False sharing](https://en.wikipedia.org/wiki/False_sharing) is a known performance issue
for systems with distributed caches. It also occurred for the implementation of a thread
parallel bounds checking routine for the subcell IDP limiting
in [PR #1736](https://github.com/trixi-framework/Trixi.jl/pull/1736).
After some [testing and discussion](https://github.com/trixi-framework/Trixi.jl/pull/1736#discussion_r1423881895),
it turned out that initializing a vector of length `n * Threads.nthreads()` and only using every
n-th entry instead of a vector of length `Threads.nthreads()` fixes the problem.
Since there are no processors with caches over 128B, we use `n = 128B / size(uEltype)`.
Now, the bounds checking routine of the IDP limiting scales as hoped.

0 comments on commit ea61e26

Please sign in to comment.