Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid allocating value vectors in get_level #582

Merged
merged 5 commits into from
Sep 8, 2023
Merged

avoid allocating value vectors in get_level #582

merged 5 commits into from
Sep 8, 2023

Conversation

visr
Copy link
Member

@visr visr commented Sep 8, 2023

Dictionary uses Indices{I} as keys, and Vector{T} as values. The Parameters contain both, and therefore it was free to construct a Dictionary in a frequently called function like get_level. However with autodiff, the values could be a ReinterpretArray with Duals instead of just a Vector. This meant that on Dictionary creation it would convert the ReinterpretArray to a Vector, leading to many allocations.

This is on top of #581. After that, this was responsible for 94% of the time spent. With this PR that goes down to about 2%, leading to a nice little speedup.

visr added 3 commits September 8, 2023 15:58
When profiling runs with the default `autodiff=true`, this line was responsible for 35% of the time and almost all allocations:

https://github.com/Deltares/Ribasim/blob/b3eb044a722d1655c5465bafe50951b75fe960d6/core/src/solve.jl#L1002

`connectivity.flow` is a sparse matrix, but the DiffCache does not seem to like sparse matrixes. The `dual_du` field was a dense vector of length n x n x cache_size, and the `get_tmp` call led to further allocations trying to restructure the sparse matrix from the vector. Luckily there is the FixedSizeDiffCache that helps here: https://docs.sciml.ai/PreallocationTools/stable/#FixedSizeDiffCache

This retains the sparsity in the dual, and returns a `ReinterpretArray` from `get_tmp` during autodiff. To avoid materializing this reinterpretarray I needed to additionally fill the parent array with zeros rather than the array itself.

There is another unrelated performance fix here, and that is to concretely type the Parameter struct, by adding type parameters from its fields. Otherwise you have situations like

```
struct A
    a::Vector
end
```

where the compiler doesn't know the element type of the Vector, so it can perform less optimizations. The solution:

```
struct A{T}
    a::Vector{T}
end
```

Finally I consistently added AbstractVector/Matrix argument type annotations to ensure the ReinterpretArray could enter everywhere. And I renamed the functions to formulate flows to `formulate_flow`, to make it easier to separate them from the other `formulate!` methods.
`Dictionary` uses `Indices{I}` as keys, and `Vector{T}` as values. The Parameters contain both, and therefore it was free to construct a `Dictionary` in a frequently called function like `get_level`. However with autodiff, the values could be a ReinterpretArray with Duals instead of just a Vector. This meant that on Dictionary creation it would convert the ReinterpretArray to a Vector, leading to many allocations.

This is on top of #581. After that, this was responsible for 94% of the time spent. With this PR that goes down to about 2%, leading to a nice little speedup.
@visr visr requested a review from Hofer-Julian September 8, 2023 15:51
@visr visr added the performance Relates to runtime performance or convergence label Sep 8, 2023
visr and others added 2 commits September 8, 2023 18:19
Perhaps we should make them Indices at some point, if the O(log n) time spent doing searchsorted is much more than the hash table lookup.
@visr visr merged commit 574467f into main Sep 8, 2023
18 checks passed
@visr visr deleted the dict branch September 8, 2023 18:54
visr added a commit that referenced this pull request Sep 11, 2023
`Dictionary` uses `Indices{I}` as keys, and `Vector{T}` as values. The
Parameters contain both, and therefore it was free to construct a
`Dictionary` in a frequently called function like `get_level`. However
with autodiff, the values could be a ReinterpretArray with Duals instead
of just a Vector. This meant that on Dictionary creation it would
convert the ReinterpretArray to a Vector, leading to many allocations.

This is on top of #581. After
that, this was responsible for 94% of the time spent. With this PR that
goes down to about 2%, leading to a nice little speedup.

---------

Co-authored-by: Hofer-Julian <[email protected]>
visr added a commit that referenced this pull request Sep 14, 2023
`Dictionary` uses `Indices{I}` as keys, and `Vector{T}` as values. The
Parameters contain both, and therefore it was free to construct a
`Dictionary` in a frequently called function like `get_level`. However
with autodiff, the values could be a ReinterpretArray with Duals instead
of just a Vector. This meant that on Dictionary creation it would
convert the ReinterpretArray to a Vector, leading to many allocations.

This is on top of #581. After
that, this was responsible for 94% of the time spent. With this PR that
goes down to about 2%, leading to a nice little speedup.

---------

Co-authored-by: Hofer-Julian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Relates to runtime performance or convergence
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants