-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add BalancedModel #1
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
test/BalancedModel.jl
Outdated
y_train, y_test = y[train_inds], y[test_inds] | ||
|
||
# Load models and balancers | ||
SVC = @load SVC pkg = LIBSVM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to avoid the C models and use pure Julia models in testing where possible, because it leads to less maintenance hangups. A simple Julia deterministic classifier is OneRuleClassifier (You can also turn a probabilistic model into a deterministic one using pipeline syntax probabilistic_model |> (y |> mode.(y))
.)
Also, keep in mind that there is a known issue which prevents using composite models (from exported learning networks) in multithreading mode for some non-Julia models: JuliaAI/MLJBase.jl#783
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test if a model in pure julia use you can do MLJBase.is_pure_julia(model)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom I don't see how probabilistic_model |> (y |> mode.(y))
would change the model type itself from Probabilistic
to Deterministic
, will it? OneRuleClassifier
only works with categorical data. I ended up ditching the SVM and using DeterministicConstantClassifier
, afterall all I want to assure here is that it works with the Deterministic
type with no problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. This is a good resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only substantive suggestion is to avoid non-Julia models in testing.
I think that Julia 1.6 (LTS) and Julia 1 (latest) should be in the CI matrix, and that suffices. |
I believe the Windows fail is due to a known Julia issue with SVMLIB and XGBoost particular to Windows (actually a Julia bug) which was corrected at about Julia 1.9.2 or 1.9.3. It appeared around Julia 1.8.0. So this should disappear when you skip 1.8 testing. Another argument for avoiding non-Julia models in testing 😉 |
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
Co-authored-by: Anthony Blaom, PhD <[email protected]>
@ablaom Ready to merge this once the final comments are addressed. Thank you so much for your time during the review. |
Okay. There is only one comment to address (there was another but have found the answer). note that Github checks fail only due to daily limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go.
Description
This PR adds
BalancedModel
which allows sequential composition of resampling models fromImbalance.jl
(or any transforms that operate on(X, y)
data.@ablaom could you please review the additions when you get the chance. The interface implemented is very similar to that of
Stacking
inMLJBase
.