Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LKJ bijector is having problems #144

Open
cpfiffer opened this issue Oct 30, 2020 · 8 comments
Open

LKJ bijector is having problems #144

cpfiffer opened this issue Oct 30, 2020 · 8 comments

Comments

@cpfiffer
Copy link
Member

cpfiffer commented Oct 30, 2020

I wish I could be more specific about the problems in the issue description, but I'm having a hard time codifying them. I've gotten singular exceptions, non-PD exceptions, and non-Hermitian matrix exceptions, neither of which seem to have an obvious cause to me.

MWE:

using Turing, LinearAlgebra

@model function ex(data)
    # Set up covariance matrix
    sigma ~ filldist(InverseGamma(2, 3), size(data, 2))
    omega ~ LKJ(size(data, 2), 1.0)
    Σ = Symmetric(diagm(sigma) * omega * diagm(sigma))

    # Observations
    for t in 1:size(data, 1)
        data[t,:] ~ MvNormal(zeros(size(data, 2)), Σ)
    end
end

# Parameters
dim = 10
n_obs = 1000

# Variances
sigma = rand(InverseGamma(2, 3), dim)

# Correlation matrix
omega = rand(LKJ(dim, 2.0))
Sigma = Symmetric(omega)

# Data
data = rand(MvNormal(zeros(dim), Sigma), n_obs)'

# Construct model
model = ex(data)
chain = sample(model, NUTS(), 100)

Running the example above gets me

ERROR: LoadError: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
 [1] checkpositivedefinite at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:18 [inlined]
 [2] cholesky!(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float6
4},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,
1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{Na
medTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{S
et{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},Arra
y{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int
64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.S
elector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega)
,Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1
}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}, ::Val{false}; check::Boo
l) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:219
 [3] #cholesky#130 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined]
 [4] cholesky at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined] (repeats 2 times)
 [5] (::Bijectors.CorrBijector)(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{In
verseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},
1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{Dyna
micPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{F
loat64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64}
,Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tup
le{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array
{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple
{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{Dynam
icPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}) at /hom
e/cameron/.julia/dev/Bijectors/src/bijectors/corr.jl:68
 [6] logabsdetjac at /home/cameron/.julia/packages/Bijectors/Fye2h/src/bijectors/corr.jl:101 [inlined]
 [7] logpdf_with_trans(::LKJ{Float64,Int64}, ::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillA
rrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:om
ega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.Samp
lerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{
}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultCont
ext},Float64},Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{F
loat64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Flo
at64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarIn
fo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Ar
ray{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10}
,2}}, ::Bool) at /home/cameron/.julia/dev/Bijectors/src/Bijectors.jl:130

If I reduce dims to 3, sampling basically just hangs forever. Occasionally it'll throw a singular exception, but I haven't been able to recreate it and so I don't have a callstack.

Any thoughts here?

@torfjelde
Copy link
Member

I don't think there's anything wrong here, it's just that it's numerically very unstable. If I run the code with an inserted print statement in the definition of the inverse-evaluation of CorrBijector, I get the following as an input (i.e. the value in the unconstrained space that HMC is working with):

10×10 Array{Float64,2}:
 0.0  -1630.14  -857.317  -8051.24      -825.602   -1598.03   -1801.29
 0.0      0.0   8426.06   25379.5       -5603.79    -6840.33   -3651.21
 0.0      0.0      0.0    30092.8       -2565.67    -4733.44     -56.0023
 0.0      0.0      0.0        0.0       -7583.1    -10285.7     8297.9
 0.0      0.0      0.0        0.0       -4432.82    -6084.45    6599.2
 0.0      0.0      0.0        0.0     -23834.4    -31462.6   -55327.8
 0.0      0.0      0.0        0.0      -33831.2    -42291.8    22720.5
 0.0      0.0      0.0        0.0           0.0     42018.5    13520.2
 0.0      0.0      0.0        0.0           0.0         0.0    32250.5
 0.0      0.0      0.0        0.0           0.0         0.0        0.0

Now, that doesn't look too pretty. The "transformed" version (from unconstrained to constrained) is:

10×10 Array{Float64,2}:
  1.0  -1.0  -1.0  -1.0  -1.0  -1.0   1.0  -1.0  -1.0  -1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
  1.0  -1.0  -1.0  -1.0  -1.0  -1.0   1.0  -1.0  -1.0  -1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0

I need to look more into the LKJ distribution to see how we can deal with this though. Possibly related: https://mc-stan.org/docs/2_18/functions-reference/cholesky-lkj-correlation-distribution.html

@cpfiffer
Copy link
Member Author

cpfiffer commented Oct 30, 2020

Actually, I wonder if this is just an initilization problem. Like if we tilted the initialization towards regions of the unconstrained space that actually mapped to meaningfully differentiable correlation matrices.

Edit: bad solution, I know, because then it's a different density.

@torfjelde
Copy link
Member

torfjelde commented Nov 1, 2020

No, this is fine. This is essentially what we do in Turing wit the "robust" init-methods. But this is more of an issue with Turing rather than Bijectors, no?

EDIT: As in, it's fine to change the initialization procedure, but not the actual distribution, yeah.

@devmotion
Copy link
Member

Possibly also related: #134

@cpfiffer
Copy link
Member Author

cpfiffer commented Nov 1, 2020

No, this is fine. This is essentially what we do in Turing wit the "robust" init-methods. But this is more of an issue with Turing rather than Bijectors, no?

Yeah, perhaps, though the real issue might be the numerical instability. If the "true" bijector exists and is really numerically stable, then it doesn't matter how it's initialized.

@bratslavia
Copy link

Just a data point that may be relevant to this issue - I noticed that LKJ seems to never work with NUTS (giving the above posdef error), but it does seems to work just fine with HMC. E.g.,

using Turing

@model function LKJ_demo(P)
    y ~ LKJ(P, 1)
end
model = LKJ_demo(5)

m1 = sample(model, HMC(0.01, 20), MCMCThreads(), 1, 4) # Works
m2 = sample(model, NUTS(1, 0.65), MCMCThreads(), 1, 4) # Always fails

@torfjelde
Copy link
Member

@bratslavia This is because NUTS has an adaptation phase where it will try out possibly "extreme" step-sizes, while HMC is using the fixed-step size of 0.01 that you provide. As a result, numerical issues are more likely to show up when using NUTS rather than HMC (unless you specify a very large step-size).

@mgmverburg
Copy link

So what is the current suggested workaround, if any?
Bratslavia's MWE is indeed perfectly indicative of what I have been experiencing, that, at least as long as using NUTS then probably, it is very likely to run into an error that will basically stop all sampling.

alexandrebouchard added a commit to Julia-Tempering/Pigeons.jl that referenced this issue Aug 21, 2023
1. matrix variate dist yield PosDefinite exception (probably due to
   slicer trying extreme values, see
   TuringLang/Bijectors.jl#144 (comment)
   where it is reported similar issue with NUTS (but not HMC))
2. other matrix variate dists do not have bijector implemented
3. anyways it seems they are flattened by the time it becomes a vi..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants