Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add BalancedModel #1

Merged
merged 15 commits into from
Sep 26, 2023
5 changes: 1 addition & 4 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,8 @@ jobs:
fail-fast: false
matrix:
version:
- '1.0'
- '1.8'
- 'nightly'
os:
- ubuntu-latest
os: [ubuntu-latest, windows-latest, macOS-latest]
arch:
- x64
steps:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
/Manifest.toml
.CondaPkg
7 changes: 3 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,16 @@ OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
julia = "1"
MLJBase = "0.21"
MLJModelInterface = "1.9"
OrderedCollections = "1.6"
julia = "1.6"

[extras]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Imbalance = "c709b415-507b-45b7-9a3d-1767c89fde68"
MLJLIBSVMInterface = "61c7150f-6c77-4bb1-949c-13197eac2a52"
MLJLinearModels = "6ee0df7b-362f-4a72-a706-9e79364fb692"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test", "Imbalance", "DataFrames", "MLJLIBSVMInterface", "MLJLinearModels"]
test = ["Test", "Imbalance", "DataFrames", "MLJLinearModels", "MLJModels"]
56 changes: 55 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,56 @@
# MLJBalancing
A package with exported learning networks that combine resampling methods from Imbalance.jl and classification models from MLJ
A package providing composite models wrapping class imbalance algorithms from [Imbalance.jl](https://github.com/JuliaAI/Imbalance.jl).

## ⏬ Instalattion
```julia
import Pkg;
Pkg.add("MLJBalancing")
```

## 🚅 Sequential Resampling

This package allows chaining of resampling methods from Imbalance.jl with classification models from MLJ. Simply construct a `BalancedModel` object while specifying the model (classifier) and an arbitrary number of resamplers (also called *balancers* - typically oversamplers and/or under samplers).

### 📖 Example

#### Construct the resamplers and the model
```julia
SMOTENC = @load SMOTENC pkg=Imbalance verbosity=0
TomekUndersampler = @load TomekUndersampler pkg=Imbalance verbosity=0

oversampler = SMOTENC(k=5, ratios=1.0, rng=42)
undersampler = TomekUndersampler(min_ratios=0.5, rng=42)

logistic_model = LogisticClassifier()
```

#### Wrap them all in BalancedModel
```julia
balanced_model = BalancedModel(model=logistic_model, balancer1=oversampler, balancer2=undersampler)
```
Here training data will be passed to `balancer1` then `balancer2`, whose output is used to train the classifier `model`. In prediction, the resamplers `balancer1` and `blancer2` are bypassed.

In general, there can be any number of balancers, and the user can give the balancers arbitrary names.

#### At this point, they behave like one single model
You can fit, predict, cross-validate and finetune it like any other MLJ model. Here is an example for finetuning
```julia
r1 = range(balanced_model, :(balancer1.k), lower=3, upper=10)
r2 = range(balanced_model, :(balancer2.min_ratios), lower=0.1, upper=0.9)

tuned_balanced_model = TunedModel(
model=balanced_model,
tuning=Grid(goal=4),
resampling=CV(nfolds=4),
range=[r1, r2],
measure=cross_entropy
);

mach = machine(tuned_balanced_model, X, y);
fit!(mach, verbosity=0);
fitted_params(mach).best_model
```

## 🚆🚆 Parallel Resampling with EasyEnsemble

Coming soon...
Loading
Loading