Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add BalancedModel #1

Merged
merged 15 commits into from
Sep 26, 2023
5 changes: 1 addition & 4 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,8 @@ jobs:
fail-fast: false
matrix:
version:
- '1.0'
- '1.8'
- 'nightly'
os:
- ubuntu-latest
os: [ubuntu-latest, windows-latest, macOS-latest]
arch:
- x64
steps:
Expand Down
5 changes: 3 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,18 @@ OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
julia = "1"
MLJBase = "0.21"
MLJModelInterface = "1.9"
OrderedCollections = "1.6"
julia = "1"
EssamWisam marked this conversation as resolved.
Show resolved Hide resolved

[extras]
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
EssamWisam marked this conversation as resolved.
Show resolved Hide resolved
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Imbalance = "c709b415-507b-45b7-9a3d-1767c89fde68"
MLJLIBSVMInterface = "61c7150f-6c77-4bb1-949c-13197eac2a52"
MLJLinearModels = "6ee0df7b-362f-4a72-a706-9e79364fb692"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test", "Imbalance", "DataFrames", "MLJLIBSVMInterface", "MLJLinearModels"]
test = ["Test", "Imbalance", "DataFrames", "MLJLIBSVMInterface", "MLJLinearModels", "MLJ"]
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,46 @@
# MLJBalancing
A package with exported learning networks that combine resampling methods from Imbalance.jl and classification models from MLJ
EssamWisam marked this conversation as resolved.
Show resolved Hide resolved

## 🚅 Sequential Resampling

This package allows chaining of resampling methods from Imbalance.jl with classification models from MLJ. Simply construct a `BalancedModel` object while specifying the model and an arbitrart number of resamplers.
EssamWisam marked this conversation as resolved.
Show resolved Hide resolved

### 📖 Example

#### Construct the resamplers and the model
```julia
SMOTENC = @load SMOTENC pkg=Imbalance verbosity=0
TomekUndersampler = @load TomekUndersampler pkg=Imbalance verbosity=0

oversampler = SMOTENC(k=5, ratios=1.0, rng=42)
undersampler = TomekUndersampler(min_ratios=0.5, rng=42)

logistic_model = LogisticClassifier()
```

#### Wrap them all in BalancedModel
```julia
balanced_model = BalancedModel(model=logistic_model, balancer1=oversampler, balancer2=undersampler)
```
Here data will be passed to balancer1 then balancer2 and then the model. In general, there can be any number of balancers.
EssamWisam marked this conversation as resolved.
Show resolved Hide resolved

#### At this point, they behave like one single model
You can fit, predict, cross-validate and finetune it like any other MLJ model. Here is an example for finetuning
```julia
r1 = range(balanced_model, :(balancer1.k), lower=3, upper=10)
r2 = range(balanced_model, :(balancer2.min_ratios), lower=0.1, upper=0.9)

tuned_balanced_model = TunedModel(model=balanced_model,
tuning=Grid(goal=4),
resampling=CV(nfolds=4),
range=[r1, r2],
measure=cross_entropy);
EssamWisam marked this conversation as resolved.
Show resolved Hide resolved

mach = machine(tuned_balanced_model, X, y);
fit!(mach, verbosity=0);
fitted_params(mach).best_model
```

## 🚆🚆 Parallel Resampling with EasyEnsemble

Coming soon...
Loading
Loading