Large regression with StaticArrays in v1.11 #1102

kaipartmann · 2024-10-22T18:40:59Z

As also described in JuliaArrays/StaticArrays.jl#1282, there is a large performance regression with v1.11 when using StaticArrays:

using LinearAlgebra, StaticArrays, BenchmarkTools

function mwe1(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * X * X'
    end
    return K
end

@btime mwe1(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);

❯ julia +1.10.5 --project -t 6 mwe.jl 
  1.070 ms (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl 
  129.890 ms (7000000 allocations: 289.92 MiB)

Interestingly, the problem can be solved by changing k * X * X' to k * (X * X'):

function mwe3(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * (X * X')
    end
    return K
end
@btime mwe3(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);

❯ julia +1.10.5 --project -t 6 mwe.jl 
  805.458 μs (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl
  708.208 μs (0 allocations: 0 bytes)

The text was updated successfully, but these errors were encountered:

ronisbr · 2024-10-22T18:54:35Z

I can reproduce those results in macOS. I found some interesting scenarios based on the proposed MWE:

If I use this version, I see all the allocations:

function mwe2(a, X, n)
    local K
    for i in 1:n
        k = a * i
        K = k * X * X'
    end
    return K
end

julia> @btime mwe2(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000)
  163.540 ms (7000000 allocations: 289.92 MiB)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 10.0  10.0  10.0
 10.0  10.0  10.0
 10.0  10.0  10.0

However, if I suppress the local variable k, everything works:

function mwe3(a, X, n)
    local K
    for i in 1:n
        K = a * i * X * X'
    end
    return K
end

julia> @btime mwe3(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000)
  1.916 ns (0 allocations: 0 bytes)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 10.0  10.0  10.0
 10.0  10.0  10.0
 10.0  10.0  10.0

gbaraldi · 2024-10-22T19:06:40Z

This is an inlining change
│ %31 = invoke LinearAlgebra.broadcast(LinearAlgebra.:*::typeof(*), %29::Float64, X::SVector{3, Float64}, %30::Vararg{Any})::SMatrix{3, 3, Float64, 9}
no longer gets inlined and we allocate because of it.
Changing the code to this

function mwe1(a, X, n)
           K = zeros(SMatrix{3,3})
           for i in 1:n
               k = a * i
               K1 = k * X
               K += @inline K1* X'
           end
           return K
       end

fixes it and its actually better

ronisbr · 2024-10-22T21:57:34Z

Hi @gbaraldi !

no longer gets inlined and we allocate because of it.

Will those allocations happen on every call to that function (scalar x vector product) ? Or does it depend on the scenario? If so, I think I might experience a huge amount of performance degradation in our simulations. Is it something that might be reverted or fixed in 1.11?

nsajko · 2024-10-22T22:27:56Z

@gbaraldi what's the point with your example? As far as I see it's using different code paths so it seems irrelevant here?

To be specific, the MWE is using a three-argument * method, while your example only calls two-argument * methods.

dkarrasch · 2024-10-23T17:52:11Z

I don't remember in which release cycle (could well be in v1.11), but we introduced three-arg * specializations, and that "generic" choice seems to be a bad one for static arrays, as changing the order of multiplication makes a gigantic difference (mwe1 vs mwe3 in the OP). I'm not sure if we want to change the order of multiplication generically, or whether this should be specified by the user.

mcabbott · 2024-10-25T18:20:18Z

3-arg * was new in Julia 1.7. That did have a small negative effect on square SMatrices, a few ns, and at the time I made a PR to address this omitting the size-checks, JuliaArrays/StaticArrays.jl#919 (not merged).

Edit: sorry this is scalar-vector-adjointvec, which now calls broadcast(*, k, X, X'). Not sure whether there was any regression on 1.7 for SVectors. The earlier behaviour would have been K += (k * X) * X', which seems as fast as mwe3's K += k * (X * X') now.

I am a bit surprised that the version with a * i * X * X' is fast, as this does not call a 4-arg * method, but instead to (a * i) * X * X' and then the same 3-arg * via broadcast. But perhaps the inlining is different.

dkarrasch · 2024-10-25T18:43:11Z

Time flies. 🤦

@nsajko: @gbaraldi's quote stems from an inspection of @code_typed mwe1(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000), and the corresponding broadcast call is absent on v1.10. Instead, one finds the inlined code for all the products and additions there. So, indeed, it seems that inlining thresholds have changed, and splitting the multiplications (as he suggested) improves performance beyond what one can achieve with setting brackets explicitly (EDIT: the @inline annotation doesn't even seem to make a difference for me).

dkarrasch · 2024-10-26T20:52:22Z

Should we close this, or is there anything actionable here?

ViralBShah · 2024-11-26T19:25:24Z

Closing - but please reopen if necessary.

nsajko added the arrays [a, r, r, a, y, s] label Oct 22, 2024

KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024

ViralBShah closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large regression with StaticArrays in v1.11 #1102

Large regression with StaticArrays in v1.11 #1102

kaipartmann commented Oct 22, 2024

ronisbr commented Oct 22, 2024

gbaraldi commented Oct 22, 2024

ronisbr commented Oct 22, 2024

nsajko commented Oct 22, 2024

dkarrasch commented Oct 23, 2024

mcabbott commented Oct 25, 2024 •

edited

Loading

dkarrasch commented Oct 25, 2024 •

edited

Loading

dkarrasch commented Oct 26, 2024

ViralBShah commented Nov 26, 2024

Large regression with StaticArrays in v1.11 #1102

Large regression with StaticArrays in v1.11 #1102

Comments

kaipartmann commented Oct 22, 2024

ronisbr commented Oct 22, 2024

gbaraldi commented Oct 22, 2024

ronisbr commented Oct 22, 2024

nsajko commented Oct 22, 2024

dkarrasch commented Oct 23, 2024

mcabbott commented Oct 25, 2024 • edited Loading

dkarrasch commented Oct 25, 2024 • edited Loading

dkarrasch commented Oct 26, 2024

ViralBShah commented Nov 26, 2024

mcabbott commented Oct 25, 2024 •

edited

Loading

dkarrasch commented Oct 25, 2024 •

edited

Loading