Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SphereBijector and AngleBijector #58

Open
sethaxen opened this issue Dec 8, 2019 · 12 comments
Open

Add SphereBijector and AngleBijector #58

sethaxen opened this issue Dec 8, 2019 · 12 comments

Comments

@sethaxen
Copy link
Member

sethaxen commented Dec 8, 2019

Directional statistics deals with unit vectors and periodic variables. Directions.jl includes two such distributions: VonMises (angular) and VonMisesFisher (spherical). I'm planning to implement more (see https://discourse.julialang.org/t/rfc-taking-directional-orientational-statistics-seriously/31951). For these, we need a SphereBijector and an AngleBijector.

SphereBijector transforms an n-dimensional vector into an n-dimensional unit vector under the Euclidean norm. It's not really a bijector, since it only has a right inverse (the inclusion function), so its Jacobian has a determinant of 0. However, we can still give a logabsdetjac term that produces a uniform measure (using a standard multivariate normal kernel). See the Stan manual for details. I also have implementations of this transformation at https://github.com/salilab/HMCUtilities.jl/blob/c4602ac/src/constraint.jl#L469-L527 and tpapp/TransformVariables.jl#67.

AngleBijector simply converts cartesian coordinates of a 1-sphere (circle) to an angle using atan. When composed with SphereBijector and shift, it provides the necessary transformation for VonMises. Also composing it with scale lets one transform any periodic quantity.

I'm happy to implement these. But will you take non-bijective functions in Bijectors.jl?

@mohamed82008
Copy link
Member

mohamed82008 commented Dec 8, 2019

We already have a similar non-bijector to the ones you are proposing in Bijectors.jl for transforming an n-dimensional point to an (n-1)-dimensional simplex. But the way we define it is as a bijector + a projection on the simplex (or a truncation). If the ones you propose are just pure projection operators, perhaps another package might be more appropriate. This might be relevant https://github.com/JuliaNLSolvers/ManifoldProjections.jl.

I also wonder why we need to add these functions to Bijectors.jl to use them with the bijectors defined in Bijectors.jl. Can't we just mix them using anonymous functions? If it's logabsdetjac you are after, perhaps we can define a fallback method for logabsdetjac for arbitrary functions that makes use of Tracker. We can also provide a way using traits to have logabsdetjac drop the zero eigenvalues of the Jacobian before calculating the determinant. If the user defines a method dropzeros(::typeof(f)) = true for some user-defined f, then logabsdetjac drops the zero eigenvalues of the Jacobian of f before computing the logdet. @torfjelde does this sound reasonable or doable?

@sethaxen
Copy link
Member Author

sethaxen commented Dec 8, 2019

Fair point. My ultimate goal is to use directional/angular distributions in Turing, at least the two in Distributions, but ideally arbitrary ones. If there's a better way to do that than to add them to Bijectors, I'm happy to do it.

@torfjelde
Copy link
Member

Can't we just mix them using anonymous functions?
Hmm, unclear to me how to do this 😕 Do you have an idea of how to implement this?

If it's logabsdetjac you are after, perhaps we can define a fallback method for logabsdetjac for arbitrary functions that makes use of Tracker.

Isn't this just ADBijector with forward-evaluation defined? Though I understand you want it to work with anonymous functions.

We can also provide a way using traits to have logabsdetjac drop the zero eigenvalues of the Jacobian before calculating the determinant. If the user defines a method dropzeros(::typeof(f)) = true for some user-defined f, then logabsdetjac drops the zero eigenvalues of the Jacobian of f before computing the logdet.

Hmm, so I kind of like this idea, but again it's unclear to me how to do it. Seems like just dropping all zeros can do a lot of non-intended stuff, i.e. in the SimplexBijector you don't want to drop all zeros, you just want to drop the zero the last row and column. And if we can't find a way to use anonymous functions, it seems kind of redundant since then this could just be hard-coded into the logabsdetjac method for that particular Bijector?

Now, to what you're talking about @sethaxen, this is something I've been thinking about whether or not we should commit to doing, but haven't gotten around to writing the issue (so thank you!).

For the SimplexBijector we made a slight exception due to it's prevalence. I think to handle this properly though for many-to-one mappings f (and thus non-invertible), e.g. f = tan, we need to integrate over the mass of the preimage of the function. Say you have y = f(x) and { xᵢ } = f⁻¹(y), then you can define the pdf as p(y) = ∑ᵢ p(xᵢ) / |det 𝒥(f, xᵢ)|. Another example is defining something like AbsBijector.

The big issue is what to do when you compose these many-to-one and one-to-one mappings. I think we'd have to implement the inverse of the many-to-one as returning a Tuple of values, and then for the subsequent bijections we actually have to invert all the values in these tuples. We could do all this by adding a abstract type ManyToOneBijector <: AbstractBijector and implement the interactions with <:Bijector.

I'm definitively not opposed to this, but it seems like there's quite a bit of work that needs to be done to do this properly 😕

@sethaxen
Copy link
Member Author

sethaxen commented Dec 8, 2019

Well, it all depends on what we want the transformation for. My assumption due to my need is that we are seeking a transformation f: x ↦ y for x ∈ X and y ∈ Y for the purposes of enhancing sampling efficiency for a variable y ∼ π. To do this we need a distribution ϕ on X such that if x ∼ ϕ, then y = f(x) ∼ π.

If f is surjective, then many ϕ may satisfy this requirement. We only need one. If f is bijective, then there's exactly one ϕ, and we can obtain its log density using the area formula (it's a factor of the square root metric density, which for a square Jacobian is the logabsdetjac term.).

If the transformation is surjective-only, then we need some other way of finding ϕ. A general rule is that if we can find a measure on X that pushes forward to the uniform measure on Y, then we can augment the log density of π with the log measure to obtain ϕ. Note that we can't solve this with integration; it's the opposite problem.

This is exactly the motivation for the spherical case above. Normalization is surjective. We know we can draw uniform samples on a sphere by sampling from a multivariate normal distribution and then normalizing. We can actually draw from any multivariate normal, but we choose as a matter of convenience the standard case. Consequently, the normal kernel on Euclidean space provides the necessary term to build ϕ on X.

Because f is surjective, it does not have a left-inverse. It does, however, have a right-inverse that is only necessary to get a valid initial point on x from a valid point on y = f(x). If g: Y → Z is surjective, then g ∘ f is also surjective and thus right-invertible by composing the right inverses, and we can still do all of the above by adding the log measure terms.

However, if the goal is to generate truly invertible transformations for some other purpose, then none of the above applies. However, the above describes the requirements needed for the kinds of distributions used by PPLs like Stan and (I think) would be useful in Turing.

Addendum: Some other things we can do with these transformations: We can sample within a bounded volume in ℝⁿ by sampling in free ℝⁿ⁺¹, normalizing to the unit sphere in ℝⁿ⁺¹, projecting to the unit ball in ℝⁿ, and deforming the ball to the bounded volume. 2 of those 3 composed transformations are surjective, and they give us a powerful set of transformations for certain variables, particularly those that appear in physical models.

Apologies if I've made any notational/terminology errors.

@sethaxen
Copy link
Member Author

sethaxen commented Dec 8, 2019

I don't think it's terribly relevant, but on the off-chance it's useful, here's some early work on adding maps to Manifolds.jl, ultimately with the goal of pushing forward/pulling back objects such as differential forms and distributions: JuliaManifolds/Manifolds.jl#28

@torfjelde
Copy link
Member

TL;DR: All in all, I agree with you: I don't think we should enforce transformations to be invertible. Now the question is whether or not we should just add a isinvertible trait or if we should add a abstract type ManyToOneBijector <: AbstractBijector type. What do you guys think? I personally prefer the abstract type since it will allow us to have different behaviour when composing a invertible and non-invertible transformation, etc.

My assumption due to my need is that we are seeking a transformation f: x ↦ y for x ∈ X and y ∈ Y for the purposes of enhancing sampling efficiency for a variable y ∼ π. To do this we need a distribution ϕ on X such that if x ∼ ϕ, then y = f(x) ∼ π.

I can see this use-case, but it doesn't seem to me like this is a push-forward of a distribution by f? I.e. it doesn't put the same amount of mass on the preimage of f as it does on it's image. Because of this, it seems to me like this wouldn't be case where the change of variables should be used?

(it's a factor of the square root metric density, which for a square Jacobian is the logabsdetjac term.)

👍 for bringing differentiable geometry into the discussion:)

A general rule is that if we can find a measure on X that pushes forward to the uniform measure on Y, then we can augment the log density of π with the log measure to obtain ϕ. Note that we can't solve this with integration; it's the opposite problem.

Sorry, what are you referring to when you say "log measure"? In particular, I don't quite understand what "we can augment the log density of π with the log measure to obtain ϕ" means.

I understand the overall idea though, and this shouldn't be a problem in the current Bijectors.jl. This would just be a matter of overloading the logpdf for a TransformedDistribution for these combinations of distributions and bijectors. It doesn't seem like you could make this something "general", since the notion of pushing-forward to uniform measure only works for particular combinations. I think what I outlined before with summing over different subsets can be used for non-invertible map in more general cases (and I as I note below, it seems to me that the Gaussian → uniform on sphere is a particular case of this).

We know we can draw uniform samples on a sphere by sampling from a multivariate normal distribution and then normalizing.

So this is kind of what I'm referring to though, no? To find the probability of a particular point on the sphere, you can compute the probability of the corresponding line in R^d wrt. the Gaussian distribution. This is sort of what I'm referring to above, though in this case we're "summing" over an infinite set of points, i.e. integrating.

I don't think it's terribly relevant, but on the off-chance it's useful, here's some early work on adding maps to Manifolds.jl, ultimately with the goal of pushing forward/pulling back objects such as differential forms and distributions:

That is great! Earlier I was considering whether or not we should try to accomodate for potential cases where you are mapping between manifolds rather than just real spaces. Didn't end up doing it, as I was afraid it would be a bit too general for what the purpose of Bijectors.jl is. Also, we don't need to worry about pushing-forward forms and tangents:) I absolutely love the idea though; definitively going to follow your work on this!:)

@torfjelde
Copy link
Member

torfjelde commented Feb 17, 2021

Response to comments in #168 .

It might make sense to have a Transformation{InSpace, OutSpace} and then have Bijector{0} <: Transformation{0, 0}.

This is something along the lines of what I hand in mind to ease the transition.

We can also expand the definition of space/domain similar to TransformVariables.jl domains.

Wait, what does TransformVariables.jl do? Are you referring to the as(Real, ...) and so on?

I am wondering if the type parameter N can be dropped and the batch computation issue solved in a different way, e.g. by wrapping inputs similar to ColVecs and RowVecs in KernelFunctions.

This can be done, yes. There's a couple of downsides though:

  1. It's a bit annoying for the end-user.
  2. We still have to make assumptions about the underlying storage structure.
  3. We need a more general implementation than just ColVecs and RowVecs as we can be working with arbitrary dimensionalities and whatnot.

An added benefit to it though, is that we could potentially remove the dimensionality from the definitions of the bijectors, e.g.:

struct Batched{T, N}
    val::T
end

value(x) = x
value(x::Batched) = x.val

# TODO: implement iterator interface

# General impl
# DOWNSIDE: Means that definitions such as `transform(b::CustomBijector, x)` will be ambiguous.
# (Also, `transform` doesn't exist as of right now, but imagine this to be `(b::Bijector)(x)` for the moment.)
function transform(b::Bijector, xs::Batched)
    return Batched(map(b, xs))
end

# Specify further on a per-bijector basis, e.g. `Exp` without dimensionality:
struct Exp end

transform(::Exp, x::AbstractArray) = exp.(x)
transform(::Exp, x::Batched{<:AbstractArray{<:Real}}) = Batched(exp.(value(x)))

logabsdetjac(b::Exp, xs::AbstractArray) = sum(logabsdetjac.(b, xs.val))
function logabsdetjac(b::Exp, xs::Batched{<:AbstractVector{<:Real, N}}) where {N}
    # Assume last dimension is batch-dimension
    return sum(logabsdetjac.(b, xs.val); dims=N)
end

I still think that the restriction that the output is of the same type as the input is strange and limiting (e.g. it is problematic for the improvements of the LKJ bijector mentioned in #134 (comment)).

Are you referring to the current approach or potential "new" approach? At the moment, we make no restrictions on the input and output types beyond the dimensionality; this is why we went with only putting the dimensionality in the bijector, not specific types. This is also why, if we decide to add actual representations of input- and output-spaces beyond just dimensionality, then we need to be really careful as it can easily just make life way more difficult by introducing a bunch of type-instability and whatnot.

A bijective function requires that the mathematical dimension of the in- and outputs are the same but this dimension is not the same as N in AbstractArray{N} and in particular the in- and outputs may not be of the same structure or size.

100%. But just for the record, I don't think anyone in this discussion thinks that:) This was done because it works really well in most standard cases. The idea was always to move away from it at some point; it was just a matter of time.

it will have almost no composability with other bijectors because compositions require the spaces to be working on the same dimensionality.

Similar to regular Julia functions, one could just drop this requirement and stop trying to reason about outputs/inputs when composing bijectors and throw an error at runtime if they are not compatible.

We might be able to avoid this with a redo of how we handle batching, but the reason why we did this is because we ran into a bunch of errors which didn't throw errors, e.g. if you mistakenly construct a bijector with the wrong dimensionality and call logabsdetjac on it, you'll see that the output is of the wrong size and you'll catch it. But if it's inside of a composition you might never see it because we might use .+ to add the outputs from logabsdetjac for two different bijectors. By forcing dimensionality to be the same, this will never happen.

@mohamed82008
Copy link
Member

Wait, what does TransformVariables.jl do? Are you referring to the as(Real, ...) and so on?

Yes but I don't like the whole as thing. I like the ability to specify domains though. I don't think even TransformVariables fully embraces the concept of domains.

@torfjelde
Copy link
Member

Yes but I don't like the whole as thing. I like the ability to specify domains though. I don't think even TransformVariables fully embraces the concept of domains.

Yeah, that's why I asked. IMO TransformVariables.jl is a more of a "neat thing" rather than something that's integral to the design of the package, e.g. doesn't have anything to do with compositions of transformations and whatnot.

@mateuszbaran
Copy link

What's the current status of this? I've recently been reading about manifold-valued continuous normalizing flows and it appears that they work well without the need to have non-bijective bijectors, see for example here: https://arxiv.org/abs/2006.10254 . They use dynamic trivializations instead. The problem with it is that now instead of a single bijector we have a whole family of them that we need to pick from -- but maybe that's fine?

@torfjelde
Copy link
Member

torfjelde commented Jul 11, 2021

What's the current status of this?

I've started working on making the scope of Bijectors.jl a bit wider in #183 by:

  1. Allowing non-bijective transformations
  2. Improving support for non-real inputs.

So there's progress but it's not quite there yet. This issue is one of the motivations for that PR.

I've recently been reading about manifold-valued continuous normalizing flows and it appears that they work well without the need to have non-bijective bijectors, see for example here: https://arxiv.org/abs/2006.10254 . They use dynamic trivializations instead. The problem with it is that now instead of a single bijector we have a whole family of them that we need to pick from -- but maybe that's fine?

First off, personally I would be very happy to accomodate whatever is required to implement bijectors for manifolds:) I welcome excuses to look at differential geometry, so if there are cool stuff we could do with Bijectors.jl + Manifolds.jl, I'm very keen!

I just briefly skimmed parts of the paper and looked at the referenced paper that introduces dynamic trivializations, and AFAIK it seems like this could be defined by (p, X) ↦ ϕₚ(X) with ϕₚ: TₚM → M e.g. the exponential map, right? This could probably already be implemented using a NamedCoupling which essentially uses one or more components of the input (p in this case) to produce the parameters of the bijector which is then applied to some other part of the input (X in this case). Then you could use a sequence of these + ODE solvers to define MODE? I.e. this could be done already? Happy to explain more in detail wrt. implementation if I have understood correctly.

Though I'll admit I'm still a bit confused of how one can use retractions in place of the exponential map, I guess one "issue" here is that retractions aren't generally bijective, e.g. the retraction from R^n to S^(n - 1) as mentioned in this issue, hence we'd need support for non-bijective transformations to support the use of retractions in place of exp.

EDIT: Nvm, just read the definition of refraction. Knew about the standard topological definition, but couldn't quite see how to define this for TₚM to M. Makes more sense now:)

@mateuszbaran
Copy link

What's the current status of this?

I've started working on making the scope of Bijectors.jl a bit wider in #183 by:

  1. Allowing non-bijective transformations
  2. Improving support for non-real inputs.

So there's progress but it's not quite there yet. This issue is one of the motivations for that PR.

I've recently been reading about manifold-valued continuous normalizing flows and it appears that they work well without the need to have non-bijective bijectors, see for example here: https://arxiv.org/abs/2006.10254 . They use dynamic trivializations instead. The problem with it is that now instead of a single bijector we have a whole family of them that we need to pick from -- but maybe that's fine?

First off, personally I would be very happy to accomodate whatever is required to implement bijectors for manifolds:) I welcome excuses to look at differential geometry, so if there are cool stuff we could do with Bijectors.jl + Manifolds.jl, I'm very keen!

Oh, nice! I'm going to try soon something along the lines of MODE. Manifolds.jl already has parametrizations so I think the main missing piece is calculation of logabsdetjac. It would be nice if I could just use AD for that but one has to be very careful to handle AD on manifolds correctly.

I just briefly skimmed parts of the paper and looked at the referenced paper that introduces dynamic trivializations, and AFAIK it seems like this could be defined by (p, X) ↦ ϕₚ(X) with ϕₚ: TₚM → M e.g. the exponential map, right? This could probably already be implemented using a NamedCoupling which essentially uses one or more components of the input (p in this case) to produce the parameters of the bijector which is then applied to some other part of the input (X in this case). Then you could use a sequence of these + ODE solvers to define MODE? I.e. this could be done already? Happy to explain more in detail wrt. implementation if I have understood correctly.

Yes, the idea is to just compute in charts and then switch charts when appropriate. I'm not sure what's the right design though -- each part separately is simple enough but they can be put together in many different ways. I'll very likely have some questions when I start implementing this 😉 . I'm wondering a bit how useful these chart bijectors would be in general. We can have them as NamedCouplings but their parameters would have to be determined dynamically during a forward or reverse pass -- and, to my limited knowledge, it's not something one expects from a bijector/normalizing flow layer.

Though I'll admit I'm still a bit confused of how one can use retractions in place of the exponential map, I guess one "issue" here is that retractions aren't generally bijective, e.g. the retraction from R^n to S^(n - 1) as mentioned in this issue, hence we'd need support for non-bijective transformations to support the use of retractions in place of exp.

EDIT: Nvm, just read the definition of refraction. Knew about the standard topological definition, but couldn't quite see how to define this for TₚM to M. Makes more sense now:)

Yes, retractions in (a large part of) Riemannian optimization are different from topological retractions, and it often causes confusion. Retractions can be bijective between R^n (or an open subset of R^n) and an open neighborhood of a point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants