You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The block form of sweep is blessed by level-3 BLAS. Coupled with recursive tiling, it may improve the efficiency of matrix inversion. Below is some prototype code.
versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)
Environment:
JULIA_NUM_THREADS = 8
using BenchmarkTools, LinearAlgebra, Random, SweepOperator
# create an nxn pd test matrix
Random.seed!(123)
n =1024
A =randn(n, n)
A = A'A + I;
Method 1: invert by Cholesky
""" inv_by_chol!(A)Invert a pd matrix `A` in-place by Cholesky decomposition."""functioninv_by_chol!(A::Matrix{T}) where T <:LinearAlgebra.BlasReal
LAPACK.potrf!('U', A)
LAPACK.potri!('U', A)
A
end
const TILE =Ref(512) # this is now a length, in bytes!functiontile_maxiter(::Type{<:AbstractArray{T}}) where {T}
isbitstype(T) ||return TILE[] ÷8max(TILE[] ÷sizeof(T), 4)
endtile_maxiter(::Type{AT}) where {AT} = TILE[] ÷8# treat anything unkown like Float64@inlinefunctionfindcleft(r::UnitRange, step::Int)
iflength(r) >=2*step
minimum(r) -1+ step *div(length(r), step *2)
else# minimum(r) - 1 + div(length(r), 2, RoundNearest) # not in Julia 1.3minimum(r) -1+round(Int, length(r)/2)
endend@inlinefunctioncleave(range::UnitRange, step::Int=4)
cleft =findcleft(range, step)
first(range):cleft, cleft+1:last(range)
end# recursive tilingfunctionsweep_block!(
A ::AbstractMatrix{T},
krange ::AbstractUnitRange{<:Integer},
invsw ::Bool=false
) where {T <:LinearAlgebra.BlasReal}
klen =length(krange)
maxL =tile_maxiter(T)
if klen > maxL
k1s, k2s =cleave(krange)
sweep_block!(A, k1s, invsw)
sweep_block!(A, k2s, invsw)
elsesweep_block_kernel!(A, krange, invsw)
endreturn A
end
sweep_block! (generic function with 2 methods)
# upper triangular part should be same as sweep!UpperTriangular(sweep_block!(copy(A), 1:n)) ≈UpperTriangular(-inv(A))
true
Benchmark
At this particular size and machine, we see 3x higher performance of the tiled sweep than Cholesky in LAPACK.
@btimesweep_block!($(copy(A)), $(1:n)) evals=1;
7.184 ms (16 allocations: 512 bytes)
@btimesweep!($(copy(A)), $(1:n)) evals=1;
238.108 ms (1 allocation: 8.12 KiB)
@btimeinv_by_chol!($(copy(A))) evals=1;
25.904 ms (0 allocations: 0 bytes)
TODO
More extensive benchmarking at various matrix sizes.
Implement sweep_block_kernel! using Tullio.jl, so it accommodates more data types (half precision, BigFloat, ...), can be audo-diffed, and can be compiled by CUDA.jl.
sweep function can output determinant instead of the matrix.
The text was updated successfully, but these errors were encountered:
The block form of sweep is blessed by level-3 BLAS. Coupled with recursive tiling, it may improve the efficiency of matrix inversion. Below is some prototype code.
versioninfo()
using BenchmarkTools, LinearAlgebra, Random, SweepOperator
Method 1: invert by Cholesky
Method 2: invert by sweep
Method 3: invert by tiled sweep
Benchmark
At this particular size and machine, we see 3x higher performance of the tiled sweep than Cholesky in LAPACK.
TODO
More extensive benchmarking at various matrix sizes.
Implement
sweep_block_kernel!
using Tullio.jl, so it accommodates more data types (half precision, BigFloat, ...), can be audo-diffed, and can be compiled by CUDA.jl.sweep
function can output determinant instead of the matrix.The text was updated successfully, but these errors were encountered: