-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLJ Integration #3
Comments
I think this sounds like a good plan. We can definitely make a wrapper so that each point knows its manifold. In fact that's how my early prototypes worked but it turns out to be very inefficient for many algorithms. But an MLJ <-> Manifolds compatibility layer could just unwrap and wrap the result, this is fine.
That decorator thing we have isn't particularly intuitive but works fine for our purposes. In this case however representation needs to be enforced at a different level. We will figure something out.
How would that information be used in the MLJ ecosystem? Manifolds.jl is quite good at figuring out types of temporaries and results from types of arguments. |
We wouldn't need it. It would only be necessary if you can imagine an algorithm which would only work for manifolds with that given |
Thanks for your ideas. Concerning the “a point does not know which manifold it belongs to” – I see a small problem for efficiency attaching the manifold to any point (though our manifolds usually are only a few integers of information/storage). Maybe it would also be a good idea to store the manifold only with a batch of data? If we have a set of points (the training set for example) they all live on the same manifold. Would that be possible? Concerning the decorator – it might take a while to carefully understand that approach we follow there, the rough idea is as follows: Concerning your idea of enforcing a point representation. That should just be doable with |
Performance wouldn't be affected that much on Julia 1.5+ thanks to the memory layout changes of structs. I usually do care about performance and I don't think it would be a problem for this interface 🙂 . I will be perfectly fine with something like struct PointAndManifold{TP,TM<:Manifold} <: MPoint
p::TP
M::TM
end |
Then I am also fine with that variant, for sure. |
So there seem a few ways to move forward here:
@kellertuer @mateuszbaran Do you have a preference for how you want to proceed? Side question: Do you have models where tangent vectors would be part of data presented by MLJ users? That is, do we need analogues of the above for tangent vectors? |
I would prefer Concerning the tangent vectors – we also thought about that, it's actually easy: A tangent vector X has to “know” its base point, but the tuple |
I'm fine with either variant 1 or 2. It would be nice if users could just add or multiply by scalars tangent vectors wrapped in
That may not be the best example because |
Ah, but to distinguish that correctly we might need a |
OK, after some discussion on Slack the conclusion is that MLJ could just do variant 1 and we will work out details of integration on the Manifolds side. |
Continuing the discussion in #2 (and on slack). Some thoughts on what the issues might be.
As I understand it, a point on an arbitrary
Manifold
object does not generally know what manifold to which it belongs, correct? This is fine as far the working with these points internal to your manifold-specific algorithms, but not ideal from the point of view of integration with the rest of the ML ecosystem. The problem is roughly analogous to categorical variables. Internally these are usually represented as integers, but algorithms still need to know the total number of possible classes to avoid problems, such as certain classes disappearing on resampling. Passing this information around is not as easy as it first appears. Life is much easier (for a tool box like MLJ) if we simply assume every point knows all the classes - and that is why we (and other packages) insist on the use ofCategoricalArrays
for representing such data (although ordinary arrays of some "categorical value" type would also have sufficed.)In the future, we might have algorithms which deal with mixed data types, one or more or which is a manifold type (think of geophysical applications) and having to keep track of metadata for a subset of variables gets messy.
So my tentative suggestion would be that MLJ users would present input data for a supervised learning algorithm from the ManifoldML package as an abstract vector of "manifold points", where a "manifold point" is a point which combines the manifold to which the point belongs with some internal representation. This could be as simple as a tuple
(M, p)
, for example. We define a new scientific typeManifoldPoint{M}
whereM
is the concrete manifold type and declarescitype( (M, p))
= ManifoldPoint{typeof(M)}`. Then your input type declarations in the implementation of the MLJ interface would look something like:And the rest would be straightforward, I should think.
Other random thoughts:
Maybe there is some way to "decorate" existing manifolds to enforce the kind of point representation we want. I don't really understand this decorating business enough to say, or if this is really an advantage.
Maybe we want to refine the scitype to include the
number_type
as type parameter@kellertuer @mateuszbaran Your thoughts?
The text was updated successfully, but these errors were encountered: