Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qr! is a performance bottleneck. #25

Open
filtron opened this issue Dec 4, 2023 · 0 comments
Open

qr! is a performance bottleneck. #25

filtron opened this issue Dec 4, 2023 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@filtron
Copy link
Owner

filtron commented Dec 4, 2023

Results

Replace qr! with qrwoq! (a non-allocating unblocked implementation) defined below reveals this:

Benchmark results for qr!:

 Info: Benchmark: cache creation
[ Info: low order
  681.937 ns (10 allocations: 20.73 KiB)
[ Info: high order (q = 13)
  2.557 μs (16 allocations: 32.64 KiB)


[ Info: Benchnmark: function exection no cache / cache
[ Info: order = 3
  293.142 μs (65 allocations: 187.47 KiB)
  290.885 μs (54 allocations: 157.19 KiB)
[ Info: order = 5
  208.495 μs (53 allocations: 139.58 KiB)
  205.588 μs (42 allocations: 108.95 KiB)
[ Info: order = 7
  176.071 μs (47 allocations: 120.62 KiB)
  172.630 μs (36 allocations: 89.73 KiB)
[ Info: order = 9
  144.191 μs (41 allocations: 102.02 KiB)
  140.820 μs (30 allocations: 70.69 KiB)
[ Info: order = 13
  109.761 μs (35 allocations: 90.70 KiB)
  105.870 μs (18 allocations: 51.45 KiB)

Benchmark results for qrwoq!

[ Info: Benchmark: cache creation
[ Info: low order
  709.050 ns (10 allocations: 20.73 KiB)
[ Info: high order (q = 13)
  2.555 μs (16 allocations: 32.64 KiB)


[ Info: Benchnmark: function exection no cache / cache
[ Info: order = 3
  136.396 μs (15 allocations: 30.66 KiB)
  133.765 μs (4 allocations: 384 bytes)
[ Info: order = 5
  100.062 μs (18 allocations: 31.16 KiB)
  97.347 μs (7 allocations: 544 bytes)
[ Info: order = 7
  87.107 μs (18 allocations: 31.42 KiB)
  84.334 μs (7 allocations: 544 bytes)
[ Info: order = 9
  74.645 μs (18 allocations: 31.86 KiB)
  72.474 μs (7 allocations: 544 bytes)
[ Info: order = 13
  60.555 μs (20 allocations: 41.95 KiB)
  56.702 μs (3 allocations: 2.70 KiB)

Benchmark script:

using FiniteHorizonGramians, BenchmarkTools

λ = 1.0
n = 20
t = 1.0
A = λ * (I - 2 * tril(ones(n, n)))
B = sqrt(2λ) * ones(n, 1)
eA = similar(A)
U = similar(A)

qs = [3, 5, 7, 9, 13]

methods = [ExpAndGram{Float64,q}() for q in qs]
caches = [FiniteHorizonGramians.alloc_mem(A, B, method) for method in methods]

order(::ExpAndGram{T,Q}) where {T,Q} = Q

@info "Benchmark: cache creation"
@info "low order"
@btime FiniteHorizonGramians.alloc_mem($A, $B, ExpAndGram{Float64,3}())
@info "high order (q = 13)"
@btime FiniteHorizonGramians.alloc_mem($A, $B, ExpAndGram{Float64,13}())

println("\n")

@info "Benchnmark: function exection no cache / cache"
for (method, cache) in zip(methods, caches)
    @info "order = $(order(method))"
    @btime exp_and_gram_chol($A, $B, $t, $method)
    @btime exp_and_gram_chol!($eA, $U, $A, $B, $t, $method, $cache)
end

Conclusion

Around 99% of the memory consumption for methods 3, 5, 7, 9 is due to qr!. Around 94% for method 13.
There is also a significant speed-up but that is perhaps because the default block size of 36 in qr! is not an intelligent choice for the size of this particular test problem.

Definition of qrwoq!:

function qrwoq!(A::AbstractMatrix{T}) where {T}
    LinearAlgebra.require_one_based_indexing(A)
    m, n = size(A)
    for k = 1:min(m - 1 + !(T<:Real), n)
        x = view(A, k:m, k)
        τk = LinearAlgebra.reflector!(x)
        LinearAlgebra.reflectorApply!(x, τk, view(A, k:m, k + 1:n))
    end
    Av = view(A, 1:min(m, n), 1:n)
    triu!(Av)
    triu2cholesky_factor!(Av)
end
@filtron filtron added the help wanted Extra attention is needed label Dec 6, 2023
@filtron filtron changed the title Most allocations seem to be coming from qr! qr! is a performance bottleneck. Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant