Add non-default broadcasting of ExtendedJumpArray's. #340

meson800 · 2023-08-30T02:00:27Z

Summary

The previous ExtendedJumpArray code contains code that implements part of the broadcastable interface, but is not actually used due to the entire interface not being included. This commit replaces that broadcast code, reusing much of the arg packing/repacking. Broadcasting operations that only rely on scalars and ExtendedJumpArray's will now use an efficient implementation that can emit SIMD instructions at the native and LLVM code levels. This closes #335. Benchmarking shows between a two to five fold increase in performance with this broadcasting method.

The previous code uses the fallback indexing method when broadcasting, which requires branching inside the broadcast loop. This branch is likely well-branch-predicted, but the presence of the branch prevents kernel functions from being fused into efficient SIMD instructions.

Implementation

This PR defines a ExtendedJumpArrayStyle, a broadcast style that includes two sub-styles that represent the styles for broadcasting the u and jump_u entries. copyto! overloads are specified that do the efficient thing and call copyto!, using the broadcast computation tree, on u and jump_u separately. Binary style rules are defined such that broadcasts fall back to the old method if you do something like adding an ExtendedJumpArray to a normal vector.

The arg tuple unpacking is mostly the same as the old partial implementation that was there; I just renamed the functions and unified them with the new broadcasting style.

A Julia >=1.9 extension is added for FastBroadcast.jl, where the equivalent of the copyto! call, fast_materialize! is similarly overloaded. This only adds a weak dependency, so only users who also use FastBroadcast (such as OrdinaryDiffEq) will have these methods loaded; those that don't already have FastBroadcast in the environment will not pull it in as a dependency.

Tests are added to check that the new broadcasting behavior is behaving as expected. In addition, a benchmark file is included that can be used to confirm that ExtendedJumpArray broadcasting is just as fast as broadcasting of normal Vector{Float64}'s. This PR passes all other tests, including e.g. the ones that use auto differentiation; broadcasting works great even in this case :)

Remaining problems/notes

Inhomogeneous ExtendedJumpArray's still act strangely under broadcasting, but no more strangely than they were already acting. In fact, some of the VariableRateJump tests rely on the behavior that an ExtendedJumpArray plus a Vector decay to become a Vector.

The new broadcasting method means that if you define an output array that is inhomogeneous (e.g. one where the A.u is a Vector{Float64} but A.jump_u is a Vector{Int64}), then broadcasted operations will successfully maintain the inhomogeneity. However, if you allow the broadcast machinery to create an output array for you (allocating), it will just allocate a homogenous ExtendedJumpArray with both u and jump_u being Vector{Float64}'s. This is the same behavior as the fallback, so this should not break any existing downstream code.

Finally, I just used an implementation for selecting an output ExtendedJumpArray while broadcasting based on the Julia interface documentation; this is a recursive function that is used to unpack the broadcasted tuple in a type-stable way, but there can be a few allocations that happen prior to the broadcast if someone broadcasts a very complicated expression that exceeds a recursion depth of 16. I was trying to replace it with a concise generated function that would always be type stable, but I think it is far less likely to hit this limit than in my other recent PRs. If there is an instance where this happens, I can fix this.

The previous ExtendedJumpArray code contains code that implements part of the broadcastable interface, but is not actually used due to the entire interface not being included. This commit replaces that broadcast code, reusing much of the arg packing/repacking. Because of this, the current code uses the fallback indexing method when broadcasting, which requires branching inside the broadcast loop. This branch is likely well-branch-predicted, but the presence of the branch prevents kernel functions from being fused into efficent SIMD instructions. This defines a ExtendedJumpArrayStyle, a broadcast style that includes two sub-styles that represent the styles for broadcasting the `u` and `jump_u` entries. `copyto!` overloads are specified that do the efficent thing and call `copyto!`, using the broadcast computation tree, on `u` and `jump_u` separately. Binary style rules are defined such that broadcasts fall back to the old method if you do something like adding an ExtendedJumpArray to a normal vector. A Julia >=1.9 extension is added for FastBroadcast.jl, where the equivilant of the `copyto!` call, `fast_materialize!` is similarlly overloaded.

coveralls · 2023-08-30T02:32:29Z

Pull Request Test Coverage Report for Build 6026823188

37 of 42 (88.1%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+1.4%) to 89.861%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/extended_jump_array.jl	33	34	97.06%
ext/JumpProcessFastBroadcastExt.jl	4	8	50.0%

Totals
Change from base Build 5658974660:	1.4%
Covered Lines:	2269
Relevant Lines:	2525

💛 - Coveralls

Project.toml

test/benchmark_extended_jump_array.jl

ChrisRackauckas · 2023-08-30T10:43:15Z

This looks great! Just a few minor pieces mentioned above and I think this is good to go. Comments:

The new broadcasting method means that if you define an output array that is inhomogeneous (e.g. one where the A.u is a Vector{Float64} but A.jump_u is a Vector{Int64}), then broadcasted operations will successfully maintain the inhomogeneity. However, if you allow the broadcast machinery to create an output array for you (allocating), it will just allocate a homogenous ExtendedJumpArray with both u and jump_u being Vector{Float64}'s. This is the same behavior as the fallback, so this should not break any existing downstream code.

👍 that should be fine.

Finally, I just used an implementation for selecting an output ExtendedJumpArray while broadcasting based on the Julia interface documentation; this is a recursive function that is used to unpack the broadcasted tuple in a type-stable way, but there can be a few allocations that happen prior to the broadcast if someone broadcasts a very complicated expression that exceeds a recursion depth of 16. I was trying to replace it with a concise generated function that would always be type stable, but I think it is far less likely to hit this limit than in my other recent PRs. If there is an instance where this happens, I can fix this.

This is something that FastBroadcast.jl handles, so in practice we're fine.

isaacsas · 2023-08-30T11:42:10Z

Would be interesting to redo the VariableRate benchmarks once this is merged and see the performance vs. Coevolve now.

meson800 · 2023-08-30T15:26:42Z

Just a few minor pieces mentioned above and I think this is good to go

I added a commit to address those comments, thanks!

This is something that FastBroadcast.jl handles, so in practice we're fine.

I think the problem could still exist, as FastBroadcast also calls out to similar. In the ExtendedJumpArrayStyle method of similar is where find_eja is used (in order to locate the ExtendedJumpArray in the arg list to determine output axis sizes); if the argument list is too deep then the compiler might give up type inference for the recursive call, leading to some inefficient boxing/unboxing.

Would be interesting to redo the VariableRate benchmarks once this is merged and see the performance vs. Coevolve now.

I expect that the broadcasting speedup is potentially less for short ExtendedJumpArray's because they benefit less from SIMD.

isaacsas · 2023-08-30T16:51:35Z

@meson800 could you see if this issue is also closed now:

#321

If so can you add a related test using the simple example in that issue in your extended jump array tests file? (I only ask as it should be super easy to check.)

Thanks!

…loses SciML#321

meson800 · 2023-08-30T17:27:34Z

Yes, #321 looks like it is fixed now by the broadcasting changes. Tests added and pushed. It possibly won't be fixed pre Julia 1.9 though, because the FastBroadcast extension won't be loaded.

meson800 · 2023-08-30T17:38:34Z

We'll see when the 1.6 CI finishes, but I think that this is still fine on < Julia 1.9. It won't load the extension so the FastBroadcasting will be slower, but the broadcast rules should still load and correctly select the output type, fixing the problem in #321.

ChrisRackauckas · 2023-08-30T17:56:07Z

We'll see when the 1.6 CI finishes, but I think that this is still fine on < Julia 1.9. It won't load the extension so the FastBroadcasting will be slower, but the broadcast rules should still load and correctly select the output type, fixing the problem in #321.

Yes, it should just fallback to being slower, which is fine for earlier versions. We need to just move forward and get the new LTS 😅

isaacsas · 2023-08-30T18:14:53Z

@meson800 thanks for this! I suspect you've actually closed a bunch of other open ExtendedJumpArray-related issues with this.

ChrisRackauckas reviewed Aug 30, 2023

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

ChrisRackauckas reviewed Aug 30, 2023

View reviewed changes

test/benchmark_extended_jump_array.jl Outdated Show resolved Hide resolved

Renamed extension and moved benchmarks

a6069d0

Added tests for OOP-ode functions and ExtendedJumpArray broadcasting, c…

e371fd4

…loses SciML#321

ChrisRackauckas approved these changes Aug 30, 2023

View reviewed changes

ChrisRackauckas merged commit 9f89895 into SciML:master Aug 30, 2023

meson800 deleted the eja_broadcasting branch August 30, 2023 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add non-default broadcasting of ExtendedJumpArray's. #340

Add non-default broadcasting of ExtendedJumpArray's. #340

meson800 commented Aug 30, 2023 •

edited

Loading

coveralls commented Aug 30, 2023 •

edited

Loading

ChrisRackauckas commented Aug 30, 2023

isaacsas commented Aug 30, 2023

meson800 commented Aug 30, 2023

isaacsas commented Aug 30, 2023

meson800 commented Aug 30, 2023 •

edited

Loading

meson800 commented Aug 30, 2023 •

edited

Loading

ChrisRackauckas commented Aug 30, 2023

isaacsas commented Aug 30, 2023

Add non-default broadcasting of ExtendedJumpArray's. #340

Add non-default broadcasting of ExtendedJumpArray's. #340

Conversation

meson800 commented Aug 30, 2023 • edited Loading

Summary

Implementation

Remaining problems/notes

coveralls commented Aug 30, 2023 • edited Loading

Pull Request Test Coverage Report for Build 6026823188

💛 - Coveralls

ChrisRackauckas commented Aug 30, 2023

isaacsas commented Aug 30, 2023

meson800 commented Aug 30, 2023

isaacsas commented Aug 30, 2023

meson800 commented Aug 30, 2023 • edited Loading

meson800 commented Aug 30, 2023 • edited Loading

ChrisRackauckas commented Aug 30, 2023

isaacsas commented Aug 30, 2023

meson800 commented Aug 30, 2023 •

edited

Loading

coveralls commented Aug 30, 2023 •

edited

Loading

meson800 commented Aug 30, 2023 •

edited

Loading

meson800 commented Aug 30, 2023 •

edited

Loading