-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Introduce BalancedBagging #2
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
.github/workflows/CI.yml
Outdated
- '1.8' | ||
- 'nightly' | ||
os: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 1 and not 1.0 to get the latest stable release. And include 1.6 as this is the Long Term Stable release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom, for some reason Imbalance had julia = "1.8" when we released it. I changed it to 1.6 but we won't be able to test on 1.6 so long as Imbalance is a dependency or the next release isn't released yet. For now will keep 1.8 and 1 until Imbalance's next release.
src/BalancedBagging.jl
Outdated
""" | ||
function group_inds(y::AbstractVector{T}) where {T} | ||
result = LittleDict{T,AbstractVector{Int}}() | ||
freeze(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
freeze
is not mutating. So this line achieves nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean the first line is not mutating? so freeze
achieves nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean that you do not use the output of line 8 (you do not assign it to any variable used later) so this line might as well not exist. My impression was you thought that freeze(result)
mutated result
but it does not. It just returns a new immutable dictionary with faster look-up and smaller allocation than result
.
src/BalancedBagging.jl
Outdated
end | ||
|
||
""" | ||
$(MMI.doc_header(BalancedBaggingClassifier)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to explicitly require specification of a hyperparameter, because it has not default (in this case model
),then you cannot use the doc_header
shortcut, because it says that the zero-argument constructor works: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#MLJModelInterface.doc_header
So you will have to customise this part. Ditto the other wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved for this wrapper. The other wrapper doesn't have an MLJ-compliant docstring, only the quick start in the README and basic documentation. Would you like me to add one for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, wrappers don't need to have compliant doc-strings.
src/BalancedBagging.jl
Outdated
- `X`: any table of input features (e.g., a `DataFrame`) so long as elements in each column | ||
are subtypes of either the `Finite` or `Infinite` scientific types. | ||
|
||
- `y`: the binary target, which can be any `AbstractVector` where `length(unique(y)) == 2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't see this needs to be binary, but again this depends on the atomic classifier being wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom I throw an error when y
is not binary. The resampling scheme proposed by the paper assumes binary data.
In this paper, we examine only binary classification problems.
It's not trivial to generalize but I may look into that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Noted, thanks.
src/BalancedBagging.jl
Outdated
- `predict(mach, Xnew)`: return predictions of the target given | ||
features `Xnew` having the same scitype as `X` above. Predictions | ||
are probabilistic, but uncalibrated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `predict(mach, Xnew)`: return predictions of the target given | |
features `Xnew` having the same scitype as `X` above. Predictions | |
are probabilistic, but uncalibrated. | |
- `predict(mach, Xnew)`: return predictions of the target given | |
features `Xnew` having the same scitype as `X` above. Predictions | |
have the same form as that of the classifier, `model` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom model has to be probabilistic. I thought we agreed in the meeting to write that they are uncalibrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right!
src/BalancedBagging.jl
Outdated
- `predict_mode(mach, Xnew)`: instead return the mode of each | ||
prediction above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `predict_mode(mach, Xnew)`: instead return the mode of each | |
prediction above. | |
- `predict_mode(mach, Xnew)`: if `model` is a `Probabilistic` model, return the mode of each | |
prediction above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom, I constraint that the model has to be probabilistic so I will ignore "if model
is a Probabilistic
model".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
Co-authored-by: Anthony Blaom, PhD <[email protected]>
…cing into BalancedBagging
Tests aren't failing, Github just reached it's limit with me for today. |
Is this a daily limit, then? It seems we don't get a reset until October 1st, and this effects all JuliaAI private repos. I think you can go public to resolve early. |
Good point. Let me make is public immediately. |
Description
This PR adds BalancedModel which allows parallel composition of random undersampling models followed by a probabilistic classification model. In other words, it's the EasyEnsemble algorithm but for a generic probabilistic classifier rather than Adaboost.
@ablaom could you please review the additions when you get the chance.