From e174072f7b68321434eb85f3fd942dabc33235fa Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Sun, 29 Sep 2024 21:04:13 +0000 Subject: [PATCH] build based on 19d4275 --- dev/.documenter-siteinfo.json | 2 +- dev/about/index.html | 2 +- .../architecture_search/README/index.html | 2 +- .../architecture_search/notebook/index.html | 4 +- .../comparison/README/index.html | 2 +- .../comparison/notebook/index.html | 6 +- .../composition/README/index.html | 2 +- .../composition/notebook/index.html | 2 +- .../early_stopping/README/index.html | 2 +- .../notebook/{33737844.svg => 5db71ff5.svg} | 68 ++++++------- .../early_stopping/notebook/index.html | 2 +- .../hyperparameter_tuning/README/index.html | 2 +- .../notebook/{9e8006c6.svg => f4785695.svg} | 52 +++++----- .../hyperparameter_tuning/notebook/index.html | 2 +- .../incremental_training/README/index.html | 2 +- .../incremental_training/notebook/index.html | 6 +- .../live_training/README/index.html | 2 +- .../live_training/notebook/index.html | 6 +- dev/contributing/index.html | 2 +- dev/extended_examples/Boston/index.html | 2 +- dev/extended_examples/MNIST/README/index.html | 2 +- .../notebook/{67c465ec.svg => ddd632fd.svg} | 72 +++++++------- .../notebook/{933452d9.svg => ec638c6b.svg} | 96 +++++++++---------- .../MNIST/notebook/index.html | 6 +- .../spam_detection/README/index.html | 2 +- .../spam_detection/notebook/index.html | 10 +- dev/index.html | 2 +- dev/interface/Builders/index.html | 4 +- dev/interface/Classification/index.html | 4 +- dev/interface/Custom Builders/index.html | 2 +- dev/interface/Image Classification/index.html | 2 +- .../Multitarget Regression/index.html | 2 +- dev/interface/Regression/index.html | 2 +- dev/interface/Summary/index.html | 2 +- 34 files changed, 189 insertions(+), 189 deletions(-) rename dev/common_workflows/early_stopping/notebook/{33737844.svg => 5db71ff5.svg} (86%) rename dev/common_workflows/hyperparameter_tuning/notebook/{9e8006c6.svg => f4785695.svg} (85%) rename dev/extended_examples/MNIST/notebook/{67c465ec.svg => ddd632fd.svg} (85%) rename dev/extended_examples/MNIST/notebook/{933452d9.svg => ec638c6b.svg} (85%) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 43f17ad..509fcea 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-09T01:03:14","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-29T21:04:09","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/about/index.html b/dev/about/index.html index f6c2ac9..ec496db 100644 --- a/dev/about/index.html +++ b/dev/about/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/common_workflows/architecture_search/README/index.html b/dev/common_workflows/architecture_search/README/index.html index 30dfcc8..676ab0c 100644 --- a/dev/common_workflows/architecture_search/README/index.html +++ b/dev/common_workflows/architecture_search/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/architecture_search/notebook/index.html b/dev/common_workflows/architecture_search/notebook/index.html index b1cd588..8bc6ef4 100644 --- a/dev/common_workflows/architecture_search/notebook/index.html +++ b/dev/common_workflows/architecture_search/notebook/index.html @@ -86,7 +86,7 @@ fit!(mach, verbosity = 0); fitted_params(mach).best_model
NeuralNetworkClassifier(
   builder = MLP(
-        hidden = (41, 25, 33), 
+        hidden = (29, 57, 53), 
         σ = NNlib.relu), 
   finaliser = NNlib.softmax, 
   optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8), 
@@ -103,4 +103,4 @@
     mlp = [x[:model].builder for x in history],
     measurement = [x[:measurement][1] for x in history],
 )
-first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (41, 25, 33), …)0.0796866
2MLP(hidden = (53, 25, 61), …)0.0876209
3MLP(hidden = (25, 21, 53), …)0.088555
4MLP(hidden = (25, 33, 45), …)0.0887323
5MLP(hidden = (25, 45, 49), …)0.0894714
6MLP(hidden = (25, 53, 17), …)0.0928088
7MLP(hidden = (21, 61, 13), …)0.0928165
8MLP(hidden = (45, 61, 41), …)0.0969451
9MLP(hidden = (49, 49, 9), …)0.0971594
10MLP(hidden = (33, 49, 17), …)0.0981687

This page was generated using Literate.jl.

+first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (29, 57, 53), …)0.0817098
2MLP(hidden = (25, 13, 57), …)0.0847994
3MLP(hidden = (57, 9, 29), …)0.0886425
4MLP(hidden = (41, 53, 17), …)0.0897345
5MLP(hidden = (49, 53, 17), …)0.0913094
6MLP(hidden = (57, 13, 21), …)0.0919441
7MLP(hidden = (29, 17, 41), …)0.0923285
8MLP(hidden = (53, 41, 25), …)0.0933308
9MLP(hidden = (57, 29, 41), …)0.0935344
10MLP(hidden = (25, 25, 45), …)0.0969887

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/comparison/README/index.html b/dev/common_workflows/comparison/README/index.html index 29998d8..fdaa37b 100644 --- a/dev/common_workflows/comparison/README/index.html +++ b/dev/common_workflows/comparison/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/comparison/notebook/index.html b/dev/common_workflows/comparison/notebook/index.html index 93e0cbd..edf34dc 100644 --- a/dev/common_workflows/comparison/notebook/index.html +++ b/dev/common_workflows/comparison/notebook/index.html @@ -52,10 +52,10 @@ fit!(mach, verbosity=0);
┌ Warning: Layer with Float32 parameters got Float64 input.
 │   The input will be converted, but any earlier layers may be very slow.
 │   layer = Dense(4 => 5, relu)  # 25 parameters
-│   summary(x) = "4×8 Matrix{Float64}"
-└ @ Flux ~/.julia/packages/Flux/HBF2N/src/layers/stateless.jl:60

Now let's see the history for more details on the performance for each of the models

history = report(mach).history
+│   summary(x) = "4×1 Matrix{Float64}"
+└ @ Flux ~/.julia/packages/Flux/MtsAN/src/layers/stateless.jl:60

Now let's see the history for more details on the performance for each of the models

history = report(mach).history
 history_df = DataFrame(
     mlp = [x[:model] for x in history],
     measurement = [x[:measurement][1] for x in history],
 )
-sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.0857014
3RandomForestClassifier(max_depth = -1, …)0.104074
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

+sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.0857014
3RandomForestClassifier(max_depth = -1, …)0.110949
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

diff --git a/dev/common_workflows/composition/README/index.html b/dev/common_workflows/composition/README/index.html index 59f4918..c9a8fe5 100644 --- a/dev/common_workflows/composition/README/index.html +++ b/dev/common_workflows/composition/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/composition/notebook/index.html b/dev/common_workflows/composition/notebook/index.html index c1c1101..6f4889d 100644 --- a/dev/common_workflows/composition/notebook/index.html +++ b/dev/common_workflows/composition/notebook/index.html @@ -66,4 +66,4 @@ ├────────────────────────────┼─────────┤ │ [1.0, 1.0, 0.95, 1.0, 1.0] │ 0.0219 │ └────────────────────────────┴─────────┘ -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/early_stopping/README/index.html b/dev/common_workflows/early_stopping/README/index.html index 8462bcb..6784082 100644 --- a/dev/common_workflows/early_stopping/README/index.html +++ b/dev/common_workflows/early_stopping/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/early_stopping/notebook/33737844.svg b/dev/common_workflows/early_stopping/notebook/5db71ff5.svg similarity index 86% rename from dev/common_workflows/early_stopping/notebook/33737844.svg rename to dev/common_workflows/early_stopping/notebook/5db71ff5.svg index 29d348f..b305007 100644 --- a/dev/common_workflows/early_stopping/notebook/33737844.svg +++ b/dev/common_workflows/early_stopping/notebook/5db71ff5.svg @@ -1,48 +1,48 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/common_workflows/early_stopping/notebook/index.html b/dev/common_workflows/early_stopping/notebook/index.html index c6a7802..48abb87 100644 --- a/dev/common_workflows/early_stopping/notebook/index.html +++ b/dev/common_workflows/early_stopping/notebook/index.html @@ -57,4 +57,4 @@ [ Info: final training loss: 0.045833383 [ Info: Stop triggered by EarlyStopping.NumberLimit(100) stopping criterion. [ Info: Total of 100 iterations.

Results

We can see that the model converged after 100 iterations.

plot(training_losses, label="Training Loss", linewidth=2)
-plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))
Example block output

This page was generated using Literate.jl.

+plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))Example block output

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/hyperparameter_tuning/README/index.html b/dev/common_workflows/hyperparameter_tuning/README/index.html index d724a7b..9540ed0 100644 --- a/dev/common_workflows/hyperparameter_tuning/README/index.html +++ b/dev/common_workflows/hyperparameter_tuning/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/hyperparameter_tuning/notebook/9e8006c6.svg b/dev/common_workflows/hyperparameter_tuning/notebook/f4785695.svg similarity index 85% rename from dev/common_workflows/hyperparameter_tuning/notebook/9e8006c6.svg rename to dev/common_workflows/hyperparameter_tuning/notebook/f4785695.svg index e52c3d8..304f048 100644 --- a/dev/common_workflows/hyperparameter_tuning/notebook/9e8006c6.svg +++ b/dev/common_workflows/hyperparameter_tuning/notebook/f4785695.svg @@ -1,40 +1,40 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/common_workflows/hyperparameter_tuning/notebook/index.html b/dev/common_workflows/hyperparameter_tuning/notebook/index.html index 2a1aed6..24f8ef5 100644 --- a/dev/common_workflows/hyperparameter_tuning/notebook/index.html +++ b/dev/common_workflows/hyperparameter_tuning/notebook/index.html @@ -67,4 +67,4 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy", -)Example block output

This page was generated using Literate.jl.

+)Example block output

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/incremental_training/README/index.html b/dev/common_workflows/incremental_training/README/index.html index efc0d3d..856c449 100644 --- a/dev/common_workflows/incremental_training/README/index.html +++ b/dev/common_workflows/incremental_training/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/incremental_training/notebook/index.html b/dev/common_workflows/incremental_training/notebook/index.html index 94d7ea7..7ea6fb6 100644 --- a/dev/common_workflows/incremental_training/notebook/index.html +++ b/dev/common_workflows/incremental_training/notebook/index.html @@ -35,8 +35,8 @@ fit!(mach)
trained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)
   args: 
-    1:	Source @808 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
-    2:	Source @664 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
+    1:	Source @163 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
+    2:	Source @945 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
 

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.4392339631006042
val_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9

Poor performance it seems.

Incremental Training

Now let's train it for another 30 epochs at half the original learning rate. All we need to do is changes these hyperparameters and call fit again. It won't reset the model parameters before training.

clf.optimiser = Optimisers.Adam(clf.optimiser.eta/2)
 clf.epochs = clf.epochs + 30
 fit!(mach, verbosity=2);
[ Info: Updating machine(NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …).
@@ -69,4 +69,4 @@
 [ Info: Loss is 0.1353
 [ Info: Loss is 0.1251
 [ Info: Loss is 0.1173
-[ Info: Loss is 0.1102

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.10519664737051289
training_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9666666666666667

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

+[ Info: Loss is 0.1102

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.10519664737051289
training_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9666666666666667

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

diff --git a/dev/common_workflows/live_training/README/index.html b/dev/common_workflows/live_training/README/index.html index 17887c3..e87e140 100644 --- a/dev/common_workflows/live_training/README/index.html +++ b/dev/common_workflows/live_training/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/live_training/notebook/index.html b/dev/common_workflows/live_training/notebook/index.html index 5f1aa6d..87ba384 100644 --- a/dev/common_workflows/live_training/notebook/index.html +++ b/dev/common_workflows/live_training/notebook/index.html @@ -78,6 +78,6 @@ fit!(mach, force=true)
trained Machine; does not cache data
   model: ProbabilisticIteratedModel(model = NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …)
   args: 
-    1:	Source @660 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
-    2:	Source @637 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
-

This page was generated using Literate.jl.

+ 1: Source @490 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}} + 2: Source @135 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}} +

This page was generated using Literate.jl.

diff --git a/dev/contributing/index.html b/dev/contributing/index.html index 280caec..b7a7621 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -1,2 +1,2 @@ -Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

+Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

diff --git a/dev/extended_examples/Boston/index.html b/dev/extended_examples/Boston/index.html index e759e5f..744e398 100644 --- a/dev/extended_examples/Boston/index.html +++ b/dev/extended_examples/Boston/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/extended_examples/MNIST/README/index.html b/dev/extended_examples/MNIST/README/index.html index fada510..ef6d183 100644 --- a/dev/extended_examples/MNIST/README/index.html +++ b/dev/extended_examples/MNIST/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/extended_examples/MNIST/notebook/67c465ec.svg b/dev/extended_examples/MNIST/notebook/ddd632fd.svg similarity index 85% rename from dev/extended_examples/MNIST/notebook/67c465ec.svg rename to dev/extended_examples/MNIST/notebook/ddd632fd.svg index 810ab59..3f10763 100644 --- a/dev/extended_examples/MNIST/notebook/67c465ec.svg +++ b/dev/extended_examples/MNIST/notebook/ddd632fd.svg @@ -1,50 +1,50 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/extended_examples/MNIST/notebook/933452d9.svg b/dev/extended_examples/MNIST/notebook/ec638c6b.svg similarity index 85% rename from dev/extended_examples/MNIST/notebook/933452d9.svg rename to dev/extended_examples/MNIST/notebook/ec638c6b.svg index 5a00390..6b663b0 100644 --- a/dev/extended_examples/MNIST/notebook/933452d9.svg +++ b/dev/extended_examples/MNIST/notebook/ec638c6b.svg @@ -1,62 +1,62 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/extended_examples/MNIST/notebook/index.html b/dev/extended_examples/MNIST/notebook/index.html index 027ed50..111180d 100644 --- a/dev/extended_examples/MNIST/notebook/index.html +++ b/dev/extended_examples/MNIST/notebook/index.html @@ -78,7 +78,7 @@ 0.055122323 0.057923194

Adding 20 more epochs:

clf.epochs = clf.epochs + 20
 fit!(mach, rows=train);
[ Info: Updating machine(ImageClassifier(builder = Main.MyConvBuilder(3, 16, 32, 32), …), …).
-
Optimising neural net:  10%[==>                      ]  ETA: 0:00:06
Optimising neural net:  14%[===>                     ]  ETA: 0:00:08
Optimising neural net:  19%[====>                    ]  ETA: 0:00:08
Optimising neural net:  24%[=====>                   ]  ETA: 0:00:08
Optimising neural net:  29%[=======>                 ]  ETA: 0:00:07
Optimising neural net:  33%[========>                ]  ETA: 0:00:07
Optimising neural net:  38%[=========>               ]  ETA: 0:00:06
Optimising neural net:  43%[==========>              ]  ETA: 0:00:06
Optimising neural net:  48%[===========>             ]  ETA: 0:00:06
Optimising neural net:  52%[=============>           ]  ETA: 0:00:05
Optimising neural net:  57%[==============>          ]  ETA: 0:00:05
Optimising neural net:  62%[===============>         ]  ETA: 0:00:04
Optimising neural net:  67%[================>        ]  ETA: 0:00:04
Optimising neural net:  71%[=================>       ]  ETA: 0:00:03
Optimising neural net:  76%[===================>     ]  ETA: 0:00:03
Optimising neural net:  81%[====================>    ]  ETA: 0:00:02
Optimising neural net:  86%[=====================>   ]  ETA: 0:00:02
Optimising neural net:  90%[======================>  ]  ETA: 0:00:01
Optimising neural net:  95%[=======================> ]  ETA: 0:00:01
Optimising neural net: 100%[=========================] Time: 0:00:11

Computing an out-of-sample estimate of the loss:

predicted_labels = predict(mach, rows=test);
+
Optimising neural net:  10%[==>                      ]  ETA: 0:00:06
Optimising neural net:  14%[===>                     ]  ETA: 0:00:09
Optimising neural net:  19%[====>                    ]  ETA: 0:00:09
Optimising neural net:  24%[=====>                   ]  ETA: 0:00:08
Optimising neural net:  29%[=======>                 ]  ETA: 0:00:08
Optimising neural net:  33%[========>                ]  ETA: 0:00:07
Optimising neural net:  38%[=========>               ]  ETA: 0:00:07
Optimising neural net:  43%[==========>              ]  ETA: 0:00:06
Optimising neural net:  48%[===========>             ]  ETA: 0:00:06
Optimising neural net:  52%[=============>           ]  ETA: 0:00:05
Optimising neural net:  57%[==============>          ]  ETA: 0:00:05
Optimising neural net:  62%[===============>         ]  ETA: 0:00:04
Optimising neural net:  67%[================>        ]  ETA: 0:00:04
Optimising neural net:  71%[=================>       ]  ETA: 0:00:03
Optimising neural net:  76%[===================>     ]  ETA: 0:00:03
Optimising neural net:  81%[====================>    ]  ETA: 0:00:02
Optimising neural net:  86%[=====================>   ]  ETA: 0:00:02
Optimising neural net:  90%[======================>  ]  ETA: 0:00:01
Optimising neural net:  95%[=======================> ]  ETA: 0:00:01
Optimising neural net: 100%[=========================] Time: 0:00:10

Computing an out-of-sample estimate of the loss:

predicted_labels = predict(mach, rows=test);
 cross_entropy(predicted_labels, labels[test])
0.4883231265583621

Or to fit and predict, in one line:

evaluate!(mach,
           resampling=Holdout(fraction_train=0.5),
           measure=cross_entropy,
@@ -183,7 +183,7 @@
     parameter_means2,
     title="Flux parameter mean weights",
     xlab = "epoch",
-)
Example block output

Note. The higher the number in the plot legend, the deeper the layer we are **weight-averaging.

savefig(joinpath(tempdir(), "weights.png"))
"/tmp/weights.png"

Retrieving a snapshot for a prediction:

mach2 = machine(joinpath(tempdir(), "mnist3.jls"))
+)
Example block output

Note. The higher the number in the plot legend, the deeper the layer we are **weight-averaging.

savefig(joinpath(tempdir(), "weights.png"))
"/tmp/weights.png"

Retrieving a snapshot for a prediction:

mach2 = machine(joinpath(tempdir(), "mnist3.jls"))
 predict_mode(mach2, images[501:503])
3-element CategoricalArrays.CategoricalArray{Int64,1,UInt32}:
  7
  9
@@ -197,4 +197,4 @@
     ylab = "cross entropy",
     label="out-of-sample",
 )
-plot!(epochs, training_losses, label="training")
Example block output

This page was generated using Literate.jl.

+plot!(epochs, training_losses, label="training")Example block output

This page was generated using Literate.jl.

diff --git a/dev/extended_examples/spam_detection/README/index.html b/dev/extended_examples/spam_detection/README/index.html index 0fa619e..c2f1175 100644 --- a/dev/extended_examples/spam_detection/README/index.html +++ b/dev/extended_examples/spam_detection/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/extended_examples/spam_detection/notebook/index.html b/dev/extended_examples/spam_detection/notebook/index.html index 7a0c584..a9465ba 100644 --- a/dev/extended_examples/spam_detection/notebook/index.html +++ b/dev/extended_examples/spam_detection/notebook/index.html @@ -93,13 +93,13 @@ mach = machine(clf, x_train_processed_equalized_fixed, y_train)
untrained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = GenericBuilder(apply = #15), …)
   args: 
-    1:	Source @219 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
-    2:	Source @311 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
+    1:	Source @771 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
+    2:	Source @286 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
 

Train the Model

fit!(mach)
trained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = GenericBuilder(apply = #15), …)
   args: 
-    1:	Source @219 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
-    2:	Source @311 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
+    1:	Source @771 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
+    2:	Source @286 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
 

Evaluate the Model

ŷ = predict_mode(mach, x_val_processed_equalized_fixed)
 balanced_accuracy(ŷ, y_val)
0.8840999384477648

Acceptable performance. Let's see some live examples:

using Random: Random;
 Random.seed!(99);
@@ -112,4 +112,4 @@
 z_encoded_equalized_fixed = coerce(z_encoded_equalized_fixed, Continuous)
 z_pred = predict_mode(mach, z_encoded_equalized_fixed)
 
-print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

+print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index 0eaa1e8..5338bd6 100644 --- a/dev/index.html +++ b/dev/index.html @@ -40,4 +40,4 @@ ├─────────────────────────────┼─────────┤ │ [1.0, 1.0, 0.967, 0.9, 1.0] │ 0.0426 │ └─────────────────────────────┴─────────┘ -

As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.

Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.

Basic idea: "builders" for data-dependent architecture

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

  • Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training

  • Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer

  • Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.

  • Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • Comparing your model with a non-deep learning models

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

+

As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.

Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.

Basic idea: "builders" for data-dependent architecture

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

  • Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training

  • Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer

  • Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.

  • Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • Comparing your model with a non-deep learning models

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

diff --git a/dev/interface/Builders/index.html b/dev/interface/Builders/index.html index 60bc401..2c5e973 100644 --- a/dev/interface/Builders/index.html +++ b/dev/interface/Builders/index.html @@ -1,5 +1,5 @@ -Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
+Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
 
 julia> nn = NeuralNetworkRegressor(builder = @builder(Chain(Dense(n_in, 64, relu),
                                                             Dense(64, 32, relu),
@@ -11,4 +11,4 @@
            Chain(front, Dense(d, n_out));
        end
 
-julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
+julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
diff --git a/dev/interface/Classification/index.html b/dev/interface/Classification/index.html index 05fd11f..61bc631 100644 --- a/dev/interface/Classification/index.html +++ b/dev/interface/Classification/index.html @@ -20,7 +20,7 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy") -

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

In addition to features with Continuous scientific element type, this model supports categorical features in the input table. If present, such features are embedded into dense vectors by the use of an additional EntityEmbedder layer after the input, as described in Entity Embeddings of Categorical Variables by Cheng Guo, Felix Berkhahn arXiv, 2016.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X provides input features and is either: (i) a Matrix with Continuous element scitype (typically Float32); or (ii) a table of input features (eg, a DataFrame) whose columns have Continuous, Multiclass or OrderedFactor element scitype; check column scitypes with schema(X). If any Multiclass or OrderedFactor features appear, the constructed network will use an EntityEmbedder layer to transform them into dense vectors. If X is a Matrix, it is assumed that columns correspond to features and rows corresponding to observations.
  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

  • embedding_dims: a Dict whose keys are names of categorical features, given as symbols, and whose values are numbers representing the desired dimensionality of the entity embeddings of such features: an integer value of 7, say, sets the embedding dimensionality to 7; a float value of 0.5, say, sets the embedding dimensionality to ceil(0.5 * c), where c is the number of feature levels. Unspecified feature dimensionality defaults to min(c - 1, 10).

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

  • transform(mach, Xnew): Assuming Xnew has the same schema as X, transform the categorical features of Xnew into dense Continuous vectors using the MLJFlux.EntityEmbedder layer present in the network. Does nothing in case the model was trained on an input X that lacks categorical features.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
+

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

In addition to features with Continuous scientific element type, this model supports categorical features in the input table. If present, such features are embedded into dense vectors by the use of an additional EntityEmbedder layer after the input, as described in Entity Embeddings of Categorical Variables by Cheng Guo, Felix Berkhahn arXiv, 2016.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X provides input features and is either: (i) a Matrix with Continuous element scitype (typically Float32); or (ii) a table of input features (eg, a DataFrame) whose columns have Continuous, Multiclass or OrderedFactor element scitype; check column scitypes with schema(X). If any Multiclass or OrderedFactor features appear, the constructed network will use an EntityEmbedder layer to transform them into dense vectors. If X is a Matrix, it is assumed that columns correspond to features and rows corresponding to observations.
  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

  • embedding_dims: a Dict whose keys are names of categorical features, given as symbols, and whose values are numbers representing the desired dimensionality of the entity embeddings of such features: an integer value of 7, say, sets the embedding dimensionality to 7; a float value of 0.5, say, sets the embedding dimensionality to ceil(0.5 * c), where c is the number of feature levels. Unspecified feature dimensionality defaults to min(c - 1, 10).

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

  • transform(mach, Xnew): Assuming Xnew has the same schema as X, transform the categorical features of Xnew into dense Continuous vectors using the MLJFlux.EntityEmbedder layer present in the network. Does nothing in case the model was trained on an input X that lacks categorical features.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
 import Optimisers
 import RDatasets

First, we can load the data:

mtcars = RDatasets.dataset("datasets", "mtcars");
 y, X = unpack(mtcars, ==(:VS), in([:MPG, :Cyl, :Disp, :HP, :WT, :QSec]));

Note that y is a vector and X a table.

y = categorical(y) # classifier takes catogorical input
@@ -48,4 +48,4 @@
    xscale=curve.parameter_scale,
    ylab = "Cross Entropy",
 )
-

See also ImageClassifier.

source
+

See also ImageClassifier.

source diff --git a/dev/interface/Custom Builders/index.html b/dev/interface/Custom Builders/index.html index 5131c51..035f5d1 100644 --- a/dev/interface/Custom Builders/index.html +++ b/dev/interface/Custom Builders/index.html @@ -12,4 +12,4 @@ Dense(nn.n2, n_out, init=init), ) end

Note here that n_in and n_out depend on the size of the data (see Table 1).

For a concrete image classification example, see Using MLJ to classifiy the MNIST image dataset.

More generally, defining a new builder means defining a new struct sub-typing MLJFlux.Builder and defining a new MLJFlux.build method with one of these signatures:

MLJFlux.build(builder::MyBuilder, rng, n_in, n_out)
-MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

  • chain(x) must make sense:

    • for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
    • for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
  • The object returned by chain(x) must be an AbstractFloat vector of length n_out.

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
+MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

  • chain(x) must make sense:

    • for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
    • for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
  • The object returned by chain(x) must be an AbstractFloat vector of length n_out.

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
diff --git a/dev/interface/Image Classification/index.html b/dev/interface/Image Classification/index.html index 1120f58..b233f3f 100644 --- a/dev/interface/Image Classification/index.html +++ b/dev/interface/Image Classification/index.html @@ -46,4 +46,4 @@ resampling=Holdout(fraction_train=0.5), measure=cross_entropy, rows=1:1000, - verbosity=0)

See also NeuralNetworkClassifier.

source + verbosity=0)

See also NeuralNetworkClassifier.

source diff --git a/dev/interface/Multitarget Regression/index.html b/dev/interface/Multitarget Regression/index.html index 10bef9e..290eade 100644 --- a/dev/interface/Multitarget Regression/index.html +++ b/dev/interface/Multitarget Regression/index.html @@ -23,4 +23,4 @@ # loss for `(Xtest, test)`: fit!(mach) # trains on all data `(X, y)` yhat = predict(mach, Xtest) -multitarget_l2(yhat, ytest)

See also NeuralNetworkRegressor

source +multitarget_l2(yhat, ytest)

See also NeuralNetworkRegressor

source diff --git a/dev/interface/Regression/index.html b/dev/interface/Regression/index.html index c2afdcb..0a13c67 100644 --- a/dev/interface/Regression/index.html +++ b/dev/interface/Regression/index.html @@ -43,4 +43,4 @@ # loss for `(Xtest, test)`: fit!(mach) # train on `(X, y)` yhat = predict(mach, Xtest) -l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source +l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source diff --git a/dev/interface/Summary/index.html b/dev/interface/Summary/index.html index b8cf7f7..507883a 100644 --- a/dev/interface/Summary/index.html +++ b/dev/interface/Summary/index.html @@ -2,4 +2,4 @@ Summary · MLJFlux

Models

MLJFlux provides the model types below, for use with input features X and targets y of the scientific type indicated in the table below. The parameters n_in, n_out and n_channels refer to information passed to the builder, as described under Defining Custom Builders.

Model TypePrediction typescitype(X) <: _scitype(y) <: _
NeuralNetworkRegressorDeterministicAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Continuous) (n_out = 1)
MultitargetNeuralNetworkRegressorDeterministicAbstractMatrix{Continuous} or Table(Continuous) with n_in columns<: Table(Continuous) with n_out columns
NeuralNetworkClassifierProbabilisticAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Finite} with n_out classes
NeuralNetworkBinaryClassifierProbabilisticAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Finite{2}} (but n_out = 1)
ImageClassifierProbabilisticAbstractVector(<:Image{W,H}) with n_in = (W, H)AbstractVector{<:Finite} with n_out classes
What exactly is a "model"?

In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.

Difference in Definition

In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain objects, we call them chains, and restrict use of "model" to models in the MLJ sense.

Are oberservations rows or columns?

In MLJ the convention for two-dimensional data (tables and matrices) is rows=obervations. For matrices Flux has the opposite convention. If your data is a matrix with whose column index the observation index, then your optimal solution is to present the adjoint or transpose of your matrix to MLJFlux models. Otherwise, you can use the matrix as is, or transform one time with permutedims, and again present the adjoint or transpose as the optimal solution for MLJFlux training.

Instructions for coercing common image formats into some AbstractVector{<:Image} are here.

Fitting and warm restarts

MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach) will use a warm restart if:

  • only model.epochs has changed since the last call; or

  • only model.epochs or model.optimiser have changed since the last call and model.optimiser_changes_trigger_retraining == false (the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.

Here model=mach.model is the associated MLJ model.

The warm restart feature makes it possible to externally control iteration. See, for example, Early Stopping with MLJFlux and Using MLJ to classifiy the MNIST image dataset.

Model Hyperparameters.

All models share the following hyper-parameters. See individual model docstrings for a full list.

Hyper-parameterDescriptionDefault
builderDefault builder for models.MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers)
optimiserThe optimiser to use for training.Optimiser.Adam()
lossThe loss function used for training.Flux.mse (regressors) and Flux.crossentropy (classifiers)
n_epochsNumber of epochs to train for.10
batch_sizeThe batch size for the data.1
lambdaThe regularization strength. Range = [0, ∞).0
alphaThe L2/L1 mix of regularization. Range = [0, 1].0
rngThe random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a Xoshirio that is reset on every cold restart of model (machine) training.GLOBAL_RNG
accelerationUse CUDALibs() for training on GPU; default is CPU1().CPU1()
optimiser_changes_trigger_retrainingTrue if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes.false

The classifiers have an additional hyperparameter finaliser (default is Flux.softmax, or Flux.σ in the binary case) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). It should return a vector of the same length as its input.

Loss Functions

Currently, the loss function specified by loss=... is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy to one of the classifier constructors.

That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux.

More on accelerated training with GPUs

As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs(), as in

using MLJ
 ImageClassifier = @load ImageClassifier
 model = ImageClassifier(epochs=10, acceleration=CUDALibs())
-mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Builders

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ
MLJFlux.@builderMacro for customized builders
+mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Builders

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ
MLJFlux.@builderMacro for customized builders