Use AbstractVector in LKJ and LKJCholesky bijectors #253

harisorgn · 2023-04-06T09:39:33Z

Expands on #246 .

Use ::AbstractVector in VecCorrBijector operations, so we won't need to transform to ::AbstractMatrix and back.

Add bijector for LKJCholesky. I believe this was missing and in practice it is the more efficient alternative when working with correlation matrices (avoids Cholesky decompositions on every call).
In LKJCholesky there is control over the returned factor ('U' -> UpperTriangular or 'L' -> LowerTriangular). I was wondering whether we want to respect the factor choice and always return the same triangular factor. If yes, we can use VecTriuBijector and VecTrilBijector to retain information about the original factor in LKJCholesky and return it. If no, we can always work with one type, e.g. UpperTriangular.

TO DO :

Add ChainRulesCore.rrules for all link functions that work on ::AbstractVector, defined in this PR. I have only added one rule for the forward link function, but ChainRulesTestUtils.test_rrule complains about type instability and value mismatch. When comparing the values returned by the pullback inside the closure of rrule against the one defined for Zygote I'm getting the same output though. I will have more of a look next week.
~~Document how I ended up with _logabsdetjac_inv_chol, so it can be verified.~~ This was based on the Stan manual pages for correlation matrices and Cholesky factors of correlation matrices.
EDIT 2: I have not documented the formula derivation but added a test for it that passes.
Remove this dispatch

Bijectors.jl/src/bijectors/corr.jl

Line 320 in 7f5d0fc

function _link_chol_lkj(W::LowerTriangular)

and use transpose(W::UpperTriangular).

Related to the second point, right above, in general it would be nice if we could test these analytical formulas for logabsdetjac derived by hand. I played around with it a bit, but couldn't come up with something.
EDIT : This can be done using AD. I see there is something already implemented along these lines in test/transform.jl, just needs some tweaking.

cc @torfjelde if you want to have a look already

for `LKJCholesky`

to `Matrix` row index

to `_logabsdetjac_inv_corr`

[WIP] LKJ and LKJCholesky bijectors

harisorgn · 2023-04-18T18:37:23Z

Have another look and If you think all issues were addressed, we can merge.

Actually let's wait, as the AD tests I've just added on the roundtrip transformation fail, will have a look.

harisorgn · 2023-04-19T18:01:24Z

Looked more into test_ad using the roundtrip inverse-and-then-forward transformation :

Test is failing on cholesky(Matrix{ForwardDiff.Dual...}) . I see there is no frule defined for cholesky, not sure if something ForwardDiff specific exists elsewhere.
EDIT: ForwardDiff is not officially using ChainRules to define rules, but also could not find a forward rule for cholesky in ForwardDiffChainRules.

Unsure about the Tracker error. Is the plan to keep supporting Tracker in general?

Zygote is returning nothing gradients, that's its "hard zero" IIUC. Not sure if it has to do with my usage of getproperty on Cholesky, UpperTriangular and LowerTriangular types.

ReverseDiff is passing all tests after fixing the rule for pd_from_upper.

All these tests are for AD through transform which IIUC are not relevant for Turing.jl usage. Since this PR is adding on the previous one that aims to get LKJ priors working for Turing, it might be worth merging and tackling the remaining AD issues in a future PR?

torfjelde · 2023-04-22T14:56:33Z

Regarding the test-failures, it's a bit strange.

This seems related: JuliaDiff/ForwardDiff.jl#606.

But I thought this was fixed because they pulled 0.10.33 after the discussion there, and deferred the breaking changes to 0.11. The tests are running on 0.10.35 so I don't get why we're seeing this 😕

Think this needs a bit further inspection.

And we do actually need to AD through the transform in some places, e.g. ADVI.

(Btw, I'm not done with my review, will continue later)

harisorgn · 2023-04-25T09:28:34Z

Good points @torfjelde , thanks! Here's more about the AD issues :

Tracker

From discussions elsewhere (Slack) I understand that we agree to drop support for this.

ForwardDiff

This seems related: JuliaDiff/ForwardDiff.jl#606.

It might actually not be. It seems like a numerical issue when comparing values in ishermitian.

I found two samples from the same LKJ where one passes and one fails. MWE :

using Bijectors, DistributionsAD, LinearAlgebra
using Bijectors: VecCorrBijector
using ForwardDiff
using ForwardDiff: Dual

b = VecCorrBijector('C') # bijector(LKJ(5,1))
binv = inverse(b)

f = x -> sum(b(binv(x)))

# x_f ~ LKJ(5,1)
x_f = [
    1.0  0.38808945715615550398  0.55251148082365042491   0.06333711952583508109  -0.51630779311225594164
    0.38808945715615550398  1.0  0.31760367441586356829   0.34585990227668395036   0.06051504059466897290
    0.55251148082365042491  0.31760367441586356829   1.0   0.17416714618194936715  -0.02825518349677474950
    0.06333711952583508109  0.34585990227668395036   0.17416714618194936715   1.0  -0.07513830680477201485
    -0.51630779311225594164  0.06051504059466897290  -0.02825518349677474950   -0.07513830680477201485   1.0
]
df_f = ForwardDiff.gradient(f, b(x_f)) # Errors, ishermitian returns false


# x_s ~ LKJ(5,1)
x_s = [
    1.0  -0.01569213125090618277 -0.79039374741027101923  -0.03400980954333766848   0.54371128016847525277
    -0.01569213125090618277   1.0 -0.19877390203937703173   -0.37124942960738860354  -0.39209191569764001439
    -0.79039374741027101923  -0.19877390203937703173   1.0   0.03430683023840974677  -0.62744676631878926187
    -0.03400980954333766848  -0.37124942960738860354   0.03430683023840974677   1.0   0.50841756191547016197
    0.54371128016847525277  -0.39209191569764001439    -0.62744676631878926187  0.50841756191547016197   1.0
]
df_s = ForwardDiff.gradient(f, b(x_s)) # Runs, ishermitian returns true

# Let's see where x_f fails
function ish(A::AbstractMatrix)
    # Just a copy of ishermitian with a `@show`
    indsm, indsn = axes(A)
    if indsm != indsn
        return false
    end
    for i = indsn, j = i:last(indsn)
        if A[i,j] != adjoint(A[j,i])
            @show abs(A[i,j] - adjoint(A[j,i]))
            return false
        end
   end
    return true
end

y_f = b(x_f)
ish(binv(Dual.(y_f))) # Returns false, shows abs(A[i, j] - adjoint(A[j, i])) = Dual{Nothing}(2.0816681711721685e-17)

# Without using `Dual`s though, all is good
ish(binv(y_f)) # Returns true

So ishermitian fails because of a very small difference between a single pair of adjoint elements. This is consistent across other samples from LKJ(5,1). Shall we remove the ishermitian check altogether? Not sure how safe that is, but by trying out this uniform LKJ(5,1) over correlation matrices, all I get is tiny errors with Duals like in the example.

EDIT: Tried using cholesky(x; check = false) but the gradients for these problematic samples are way off (1e-1), even if the matrices are not hermitian by very little (1e-17).

Zygote

This indeed has to do with getproperty(::Cholesky, :UL). In ChainRules there is an rrule defined for getproperty(::Cholesky, ::Symbol) that only accounts for the cases of :U and :L. So we have:

using Bijectors, DistributionsAD, LinearAlgebra
using Zygote

dist = LKJ(5, 1)
x = rand(dist)

g = x -> sum(cholesky(x).U)
dg = Zygote.gradient(g, x) # Returns correct gradient

h = x -> sum(cholesky(x).UL)
dh = Zygote.gradient(h, x) # Returns (nothing, )

So Zygote can be fixed by changing

Bijectors.jl/src/utils.jl

Line 18 in 0d599e8

cholesky_factor(X::Cholesky) = X.UL

to X.U , take the potential extra allocation (if uplo === :L) and always work with UpperTriangular downstream. Using PDMats.chol_upper as suggested here results in the same issue by accessing getproperty(::Cholesky, :factors).

Any thoughts on how to handle the ForwardDiff and Zygote cases? I think the Zygote changes are more straightforward unless I'm missing something.

harisorgn · 2023-04-25T12:02:43Z

I think the Zygote changes are more straightforward unless I'm missing something.

It is for the case of LKJ (changing X.UL to X.U works) but not for LKJCholesky. In the latter case, we construct a Cholesky during the inverse transform, as this is the support of the distribution. I'm guessing the Cholesky constructor needs an rrule for Zygote to work.

harisorgn · 2023-04-25T16:13:54Z

Zygote is fixed. It was more straightforward than writing new rrules, just passing a X::Matrix instead of X::UpperTriangular or X::LowerTriangular to Cholesky and avoid doing X.data.

ForwardDiff passed twice on the latest commit, but I changed nothing to fix it. Probably has to do with the stochastic nature of the numerical error, like the example above.

harisorgn · 2023-04-27T10:23:12Z

I restarted the Inference tests multiple times and the ForwardDiff test passes (only fails are from Tracker not being broken). I can't recreate this locally, I still get some fails and passes like the example above, and have matched package versions, so I'm confused 😅

torfjelde

Almost there! But I think the cholesky-version should just be its own struct so we avoid the type-stabilities.

Otherwise it's looking pretty dank!

And I'll have a look at the ForwardDiff issue.

torfjelde · 2023-04-22T14:11:28Z

src/bijectors/corr.jl

+
+# Fields
+- mode :`Symbol`. Controls the inverse tranformation :
+    - if `mode === :C` returns a correlation matrix


Do we need this? I'm personally happy to just support U or L.

That is, make the cholesky version into a separate type, e.g. VecCholCorrBijector. This will avoid the type-instabilities + moves the conditional handling you have in some functions into multiple dispatch instead.

@harisorgn Any updates on this?:)

I was completely off last week. Agree with splitting/specialising the structures, will implement it this week!

Ah, no worries! Sweet!

src/bijectors/corr.jl

torfjelde · 2023-04-22T14:18:07Z

src/chainrules.jl

+    return UpperTriangular(X)' * UpperTriangular(X), Δ -> begin
+        Xu = UpperTriangular(X)
+        return ChainRulesCore.NoTangent(), UpperTriangular(Xu * Δ + Xu * Δ')
+    end


This needs a ChainRulesCore.unthunk, no? https://juliadiff.org/ChainRulesCore.jl/stable/rule_author/writing_good_rules.html#Thunks

Also, maybe add a rrule test? That would have caught the missing unthunk.

The thing is I was testing the rrules locally as I was adding them and this was passing. Probably unthunking would be needed if it's part of multiple function calls that get differentiated? I am adding it anyway.

src/compat/zygote.jl

test/bijectors/utils.jl

harisorgn · 2023-06-01T12:29:26Z

@torfjelde , I implemented your suggestions, thanks for the feedback again : )

I couldn't locally reproduce the DomainError that comes up in the AD test.

Also disregard my previous confusion about reproducing the ForwardDiff numerical error. I was restarting an interface test that wasn't hitting it, hence it was passing. When the right interface test of CI was run, test failed as it fails locally (see comments above).

So there are still these two errors, plus the stack related one that is addressed in another PR here.

(Apologies for the format, only have phone access for now)

torfjelde

Great stuff @harisorgn :) Really close now!

I had a super-quick look, and made some very minor comments + changes. Once those are addressed, I think we should be good go!

Again, awesome work; I imagine this isn't the most fun PR to work on, so appreciate you seeing this through ❤️

src/bijectors/corr.jl

torfjelde · 2023-06-01T15:13:37Z

test/transform.jl

@@ -182,7 +188,23 @@ end

    upperinds = [LinearIndices(size(x))[I] for I in CartesianIndices(size(x)) if I[2] > I[1]]
    J = ForwardDiff.jacobian(x->link(dist, x), x)
-    J = J[upperinds, upperinds]
+    J = J[:, upperinds]


What was this for again? Sorry, we might have discussed this before.

Don't think we have : ) . It's because the output of dist is an AbstractVector now, so the indices of upper triagular elements don't apply anymore. In this test, x is 3x3 matrix, link(dist, x) is length 3 vector, the Jacobian is then a 3x9 matrix, and we are keeping all output elements (as they are all relevant now) and only the upperinds of the input elements.

Aaah yeah now I remember:) We didn't discuss this but I had a think through it myself the last time I looked at it 😅 Just had a vague memory of at some point being befuddled about it and then figuring it out.

test/transform.jl

harisorgn · 2023-06-06T08:37:11Z

@torfjelde accidental merge, sorry, was setting up git in a new machine 😅 . Please revert it and I'll implement the last changes.

torfjelde · 2023-06-06T09:48:27Z

Is it maybe easier if you just take over the other PR?:) #246

harisorgn added 30 commits April 4, 2023 15:17

define bijectors for LKJ and LKJCholesky

3f25a8b

add TransformedDistribution constructor

e1567c3

for `LKJCholesky`

define logpdf for LKJ & LKJCholesky

8d07e34

define rand for LKJ & LKJCholesky

9a59a9f

add util to extract Cholesky factor

f15ad85

TYPO: capitalize matrix

53e78f3

add util to convert Vector index

ec7d20e

to `Matrix` row index

add VecTriBijectors for LKJCholesky

2ed00f4

TYPO: capitilize matrix

07555fc

add LKJCholesky link for UpperTriangular

a75cabc

add LKJCholesky link for LowerTriangular

844b07e

TYPO: capitalize matrix

792cfe9

add LKJCholesky inverse link to UpperTriangular

8f0886b

rename _logabsdetjac_chol_lkj

35f1c03

to `_logabsdetjac_inv_corr`

dispatch _logabsdetjac_inv_corr for ::Vector

9d55829

add logabsdetjac for inverse link of LKJCholesky

adf10ad

add tests for VecTriBijectors

03a55b2

add rrule for LKJ(Cholesky) link function

1059569

Merge branch 'torfjelde/vec-corr' into ho/vec-lkj-cholesky

222eb6e

Merge pull request #1 from harisorgn/ho/vec-lkj-cholesky

7f5d0fc

[WIP] LKJ and LKJCholesky bijectors

use transpose in link for `::LowerTriangular'

ad080ea

add Tracker support for inverse link

6e1a5b1

better utility function call

5fd0a65

use function barrier properly for type stability

b38acda

account for difference in support dimensions

424f8ca

fix indexing in Jacobian of VecCorrBijector

b749d37

add _logabsdetjac_dist for ::LKJCholesky

7b1f74d

replace function composition for proper barrier

75c605b

add util convert Transpose -> Matrix for type stability

a7a6c05

add LKJCholesky Jacobian+type tests

09c35b6

harisorgn added 4 commits April 19, 2023 17:53

remove wrong ReverseDiff.@grad for pd_from_upper

6524fe4

add corrected rrule for pd_from_upper

5e4abae

update AD tests

c547542

remove Tracker from broken

0d599e8

harisorgn requested a review from torfjelde April 19, 2023 18:01

harisorgn added 2 commits April 25, 2023 16:19

update zero-filling in Tracker pullback

a1f16b6

fix Zygote

8b4b0c7

torfjelde requested changes May 4, 2023

View reviewed changes

harisorgn added 7 commits May 4, 2023 13:46

merge lines - applying feedback suggestions

890127f

unthunk in pd_from_upper rrule

fa13e27

split structs into VecCorrBijector and VecCholeskyBijector

a36f2b6

remove old Zygote adjoints

9690dd2

update tests

8a67713

fix Union in @inferred after splitting structs

37cfd90

remove Tracker tests as support is dropped

a3c7f57

torfjelde requested changes Jun 1, 2023

View reviewed changes

torfjelde mentioned this pull request Jun 5, 2023

Should we support Tracker.jl? TuringLang/Turing.jl#2000

Closed

harisorgn merged commit a3c7f57 into TuringLang:torfjelde/vec-corr Jun 6, 2023

harisorgn mentioned this pull request Jun 6, 2023

Implementation of VecCorrBijector #246

Merged

harisorgn mentioned this pull request Jun 20, 2023

LKJ follow-up #134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use AbstractVector in LKJ and LKJCholesky bijectors #253

Use AbstractVector in LKJ and LKJCholesky bijectors #253

harisorgn commented Apr 6, 2023 •

edited

Loading

harisorgn commented Apr 18, 2023

harisorgn commented Apr 19, 2023 •

edited

Loading

torfjelde commented Apr 22, 2023

harisorgn commented Apr 25, 2023 •

edited

Loading

harisorgn commented Apr 25, 2023

harisorgn commented Apr 25, 2023

harisorgn commented Apr 27, 2023

torfjelde left a comment

torfjelde Apr 22, 2023

torfjelde May 4, 2023

torfjelde May 15, 2023

harisorgn May 15, 2023

torfjelde May 19, 2023

torfjelde Apr 22, 2023

torfjelde May 4, 2023

harisorgn May 24, 2023

harisorgn commented Jun 1, 2023 •

edited

Loading

torfjelde left a comment

torfjelde Jun 1, 2023

harisorgn Jun 6, 2023

torfjelde Jun 6, 2023

harisorgn commented Jun 6, 2023

torfjelde commented Jun 6, 2023

Use AbstractVector in LKJ and LKJCholesky bijectors #253

Use AbstractVector in LKJ and LKJCholesky bijectors #253

Conversation

harisorgn commented Apr 6, 2023 • edited Loading

harisorgn commented Apr 18, 2023

harisorgn commented Apr 19, 2023 • edited Loading

torfjelde commented Apr 22, 2023

harisorgn commented Apr 25, 2023 • edited Loading

Tracker

ForwardDiff

Zygote

harisorgn commented Apr 25, 2023

harisorgn commented Apr 25, 2023

harisorgn commented Apr 27, 2023

torfjelde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harisorgn commented Jun 1, 2023 • edited Loading

torfjelde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harisorgn commented Jun 6, 2023

torfjelde commented Jun 6, 2023

harisorgn commented Apr 6, 2023 •

edited

Loading

harisorgn commented Apr 19, 2023 •

edited

Loading

harisorgn commented Apr 25, 2023 •

edited

Loading

harisorgn commented Jun 1, 2023 •

edited

Loading