-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning graph in mach #759
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #759 +/- ##
==========================================
+ Coverage 85.41% 85.97% +0.55%
==========================================
Files 36 36
Lines 3435 3486 +51
==========================================
+ Hits 2934 2997 +63
+ Misses 501 489 -12
Continue to review full report at Codecov.
|
@ablaom I think this is ready for review if you have some time to dedicate to it. As an additional demonstration of the benefits of this PR, you'll see on this branch a multithreaded Stack test. I haven't added that to the PR since I thought this was big enough but happy to take your comments. |
@olivierlabayle I'd like to acknowledge this proposal which I haven't reviewed yet. It sounds interesting, but it may take me a couple weeks to look properly at it. You have made raised a couple of related issues recently (caching, parallelism). Can you help me prioritise my review? Should I look at this one before the others? What are the implications of the current proposal on those others, if any. For example, is possible that implementation of this current proposal could mean the others could be resolved in a non-breaking way? |
@ablaom Thank you for getting back to me. Indeed there are many connecting topics that I am currently trying to address. The two most important ones are memory management and parallelism. This PR is an attempt to partialy address both with a small design change while minimally (one corner case explained below) breaking the API. This is definitely the place to start. The general idea of the PR is to force the use of If you are happy with the PR, the last thing I would like to add for now, is the removal of anonymization process since you said it may be avoided. I would be happy to get your guidance on that and the important parts in the code since I didn't understand the use case in the first place. I hope this helps. |
@olivierlabayle Thanks for looking into this and for your patience. I have now reviewed this PR in some detail. I have to say I really like the idea in this PR, but there are some drawbacks:
I've had a think about this, and have an alternative suggestion for dealing with |
@ablaom Thanks for taking the time to look into this!
X, y = make_regression(500, 5)
models = (constant=DeterministicConstantRegressor(),
decisiontree=DecisionTreeRegressor(),
ridge_lambda=FooBarRegressor(;lambda=0.1),
ensemble=EnsembleModel(FooBarRegressor(;lambda=1.)),
ridge=FooBarRegressor(;lambda=0))
mystack = Stack(;metalearner=FooBarRegressor(),
resampling=CV(;nfolds=3),
models...)
mach = machine(mystack, X, y)
fit!(mach, verbosity=0) The only caveat I see is if someone is actually using
Here is a vague illustration to show the idea: function fit!(mach::Machine; acceleration=CPU1(), kwargs...)
glb_node = glb(mach.args...) # greatest lower bound node of arguments
fit!(glb_node; kwargs...)
acceleration = hasfield(mach.model, "acceleration") ? mach.model.acceleration : acceleration
fit!(glb_node; acceleration=acceleration, kwargs...)
fit_only!(mach; kwargs...)
end One advantage of this PR's proposed implementation (and the "in between" one) is that a user implementing a learning network does not need to worry about parallelism and still benefit from it, which I think is the most common scenario. |
No I mean the other way around: |
check_signature(signature) | ||
# Build composite Fitresult | ||
fitresult = CompositeFitresult(signature) | ||
fitresult.network_model_names = network_model_names(mach.model, fitresult) | ||
|
||
# Fit all machines in the learning network | ||
glb_node = glb(fitresult) | ||
fit!(glb_node; kwargs...) | ||
acceleration = hasfield(T, :acceleration) && getfield(mach.model, :acceleration) !== nothing ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead use hasproperty
and getproperty
. In general model struct fields and properties can be different but it is the properties that are "public".
@@ -147,7 +149,7 @@ report(mach).cv_report | |||
``` | |||
|
|||
""" | |||
function Stack(;metalearner=nothing, resampling=CV(), measure=nothing, measures=measure, named_models...) | |||
function Stack(;metalearner=nothing, resampling=CV(), measure=nothing, measures=measure, acceleration=nothing, named_models...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the acceleration
default nothing
instead of CPU1()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal was to:
- By default propagate the acceleration specified in
fit
if the stack's acceleration is nothing - Take the stack's acceleration instead if specified.
I think you mean that if a user want's to implement a composite model using the learning network API, then they need to be familiar with machines. That is true. But presently I can use a |
@olivierlabayle I'm sorry, but after sitting on this a few days, I'm still reluctant to provide, in MLJBase.jl, a facility that encourages composite model definitions that do not meet the MLJ model API. I think this could come back to bite us. I'm also not enthusiastic about fixing the breaking behaviour I've pointed out. Another reason for my reluctance is that the MLJ model composition API is the now the most complex part of the MLJ code base. It is a worthy advancement on other multi-paradigm ML frameworks, but I feel it is reaching the limits of what can be sustainably maintained without a substantial re-design. Putting it another way, I'm afraid we (and I mean "me" predominantly) may be sliding into an "incremental hack". I think we agree (but correct me if I am wrong) that for The other use-case you mention for introducing the learning graph code is for propagating a fixed @olivierlabayle suggested:
@ablaom suggested:
|
@ablaom I understand your concern, I did not appreciate there was such an emphasis on the What follows is mostly what I would consider to be a hack that is susceptible to much confusion from a user perspective. That being said, I think indeed both |
Note here that the existing model trait |
Follow up PR: #767 |
Closing as old and the interface has substantially moved on. |
This is a proposal to extend the learning network API, this should be almost not breaking, it enables caching and other
fit
options (likeacceleration
) to be passed to submachines.Related issues: #756
The proposed routine is to declare a
learning_network(mach::Machine{<:Composite}, X, y; kwargs...)
method instead of the currentfit
for a new composite model. This is examplified with theStack
but the Pipelines for instance are kept on the current API.The only caveat is that learning networks defined that way can only be used inside machines.
If the approach looks encouraging to you and you want to push that further, my remaining TODOs are at least:
return!
, ... ?cache=false
) and more generally the anonymization routine since the data is still available anyway. I think @ablaom, you mentionned we could remove that gymnastic in a previous chat. I would be happy to get some guidance and add this to the PR if this is not breaking either.Please let me know if there are other things I haven't considered.