Proof of Concept: benchmark neighborhood search overhead #284
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was wondering for quite a while how large the actual overhead of querying my grid neighborhood search is. The part of looping over a generator that collects particle lists of neighboring cells.
I profiled a simulation on one thread without bounds checking, and I found that
get(hashtable, ...)
, which is theget(::Dict, key, default)
function in Julia Base. About half of this is inhash
.sqrt(distance2)
in the particle-neighbor loop.dot(pos_diff, pos_diff)
above to compute the squared distance.Finding that 30% of the total runtime is spent in
get(hashtable, ...)
, I got really curious. So, I implemented a neighborhood search that wraps the existing grid neighborhood search, lets it do the update part, and then computes explicit lists of neighbors for each particle as a vector of vectors. Of course, allocating the full vector of vectors is terribly slow, but then for the interaction, theeachneighbor
loop will just be a loop over a simple vector.Assuming that looping over an explicit vector of neighbors is the absolute best that we can do, we should get an upper limit of how much we can optimize the neighborhood search. Here we go:
Benchmarks
The following is the basic
rectangular_tank_2d.jl
example (tspan = (0.0, 20.0)
) on main with theGridNeighborhoodSearch
on 24 threads. So it's where we currently stand. Considering that the particles basically don't move in this example, I disabled the NHS update.This is the new explicit neighbor list NHS:
To be more accurate, here is a benchmark of just the fluid-fluid interaction:
So the NHS query overhead is about 33% of our current runtime.
Here is the grid NHS on a single thread:
And the neighbor list NHS on a single thread:
And the precise benchmark:
This is a 24% overhead.
I played around a little and only stored actual neighbors in the lists instead of "possible neighbors" (all particles in neighboring cells). This gets rid of the distance checking for particles that are outside the search radius and should probably also be considered for a "perfect" neighborhood search.
This would be a 40% overhead.
Conclusion
The NHS query could be more efficient. If we stick with grid-based NHS, the upper limit seems to be a 33% multithreaded or 24% single-threaded improvement of the fluid-fluid interaction.
It is, however, very difficult to improve the query performance without hurting the update performance, since faster querying will probably require a different data structure for the cell lists. Great care has to be taken regarding the parallelizability of the update step, or otherwise we will lose a lot of performance in the multithreaded update.
We might also be able to improve the existing NHS. About 15% of the total runtime is in
hash
, so maybe an optimized has function forNTuple{2, Int}
could help the performance. Also, the remaining 15% in the hashtable querying might come from the fact that the cell lists are spread accross memory. A compact hashing approach where the cell lists are columns of a large matrix might improve cache hits.Edit
Now I'm thinking, would it make sense to do another test where I use a contiuous vector for the neighbor lists instead of a vector of vectors?