-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LKJ bijector is having problems #144
Comments
I don't think there's anything wrong here, it's just that it's numerically very unstable. If I run the code with an inserted print statement in the definition of the inverse-evaluation of 10×10 Array{Float64,2}:
0.0 -1630.14 -857.317 -8051.24 … -825.602 -1598.03 -1801.29
0.0 0.0 8426.06 25379.5 -5603.79 -6840.33 -3651.21
0.0 0.0 0.0 30092.8 -2565.67 -4733.44 -56.0023
0.0 0.0 0.0 0.0 -7583.1 -10285.7 8297.9
0.0 0.0 0.0 0.0 -4432.82 -6084.45 6599.2
0.0 0.0 0.0 0.0 … -23834.4 -31462.6 -55327.8
0.0 0.0 0.0 0.0 -33831.2 -42291.8 22720.5
0.0 0.0 0.0 0.0 0.0 42018.5 13520.2
0.0 0.0 0.0 0.0 0.0 0.0 32250.5
0.0 0.0 0.0 0.0 0.0 0.0 0.0 Now, that doesn't look too pretty. The "transformed" version (from unconstrained to constrained) is: 10×10 Array{Float64,2}:
1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 -1.0 -1.0 -1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0
1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.0 -1.0 -1.0 -1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0
-1.0 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 1.0 1.0 I need to look more into the LKJ distribution to see how we can deal with this though. Possibly related: https://mc-stan.org/docs/2_18/functions-reference/cholesky-lkj-correlation-distribution.html |
Actually, I wonder if this is just an initilization problem. Like if we tilted the initialization towards regions of the unconstrained space that actually mapped to meaningfully differentiable correlation matrices. Edit: bad solution, I know, because then it's a different density. |
No, this is fine. This is essentially what we do in Turing wit the "robust" init-methods. But this is more of an issue with Turing rather than Bijectors, no? EDIT: As in, it's fine to change the initialization procedure, but not the actual distribution, yeah. |
Possibly also related: #134 |
Yeah, perhaps, though the real issue might be the numerical instability. If the "true" bijector exists and is really numerically stable, then it doesn't matter how it's initialized. |
Just a data point that may be relevant to this issue - I noticed that LKJ seems to never work with NUTS (giving the above posdef error), but it does seems to work just fine with HMC. E.g., using Turing
@model function LKJ_demo(P)
y ~ LKJ(P, 1)
end
model = LKJ_demo(5)
m1 = sample(model, HMC(0.01, 20), MCMCThreads(), 1, 4) # Works
m2 = sample(model, NUTS(1, 0.65), MCMCThreads(), 1, 4) # Always fails |
@bratslavia This is because NUTS has an adaptation phase where it will try out possibly "extreme" step-sizes, while |
So what is the current suggested workaround, if any? |
1. matrix variate dist yield PosDefinite exception (probably due to slicer trying extreme values, see TuringLang/Bijectors.jl#144 (comment) where it is reported similar issue with NUTS (but not HMC)) 2. other matrix variate dists do not have bijector implemented 3. anyways it seems they are flattened by the time it becomes a vi..
I wish I could be more specific about the problems in the issue description, but I'm having a hard time codifying them. I've gotten singular exceptions, non-PD exceptions, and non-Hermitian matrix exceptions, neither of which seem to have an obvious cause to me.
MWE:
Running the example above gets me
If I reduce
dims
to 3, sampling basically just hangs forever. Occasionally it'll throw a singular exception, but I haven't been able to recreate it and so I don't have a callstack.Any thoughts here?
The text was updated successfully, but these errors were encountered: