From c61e4bf4abfdf8bf0f7eed5af870da77ea31c041 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Sun, 16 Jun 2024 20:04:45 +0000 Subject: [PATCH] build based on 07e01b7 --- dev/.documenter-siteinfo.json | 2 +- dev/about/index.html | 2 +- .../architecture_search/README/index.html | 2 +- .../architecture_search/notebook/index.html | 4 +- .../comparison/README/index.html | 2 +- .../comparison/notebook/index.html | 2 +- .../composition/README/index.html | 2 +- .../composition/notebook/index.html | 2 +- .../early_stopping/README/index.html | 2 +- .../notebook/{afe15004.svg => 322fe077.svg} | 68 ++++++------- .../early_stopping/notebook/index.html | 2 +- .../hyperparameter_tuning/README/index.html | 2 +- .../notebook/{24fdd955.svg => 8e367541.svg} | 52 +++++----- .../hyperparameter_tuning/notebook/index.html | 2 +- .../incremental_training/README/index.html | 2 +- .../incremental_training/notebook/index.html | 6 +- .../live_training/README/index.html | 2 +- .../live_training/notebook/index.html | 6 +- dev/contributing/index.html | 2 +- dev/extended_examples/Boston/index.html | 2 +- dev/extended_examples/MNIST/README/index.html | 2 +- .../notebook/{f64fa35a.svg => becce672.svg} | 72 +++++++------- .../notebook/{8c12c35a.svg => f281e18f.svg} | 96 +++++++++---------- .../MNIST/notebook/index.html | 6 +- .../spam_detection/README/index.html | 2 +- .../spam_detection/notebook/index.html | 10 +- dev/index.html | 2 +- dev/interface/Builders/index.html | 4 +- dev/interface/Classification/index.html | 4 +- dev/interface/Custom Builders/index.html | 2 +- dev/interface/Image Classification/index.html | 2 +- .../Multitarget Regression/index.html | 2 +- dev/interface/Regression/index.html | 2 +- dev/interface/Summary/index.html | 2 +- 34 files changed, 187 insertions(+), 187 deletions(-) rename dev/common_workflows/early_stopping/notebook/{afe15004.svg => 322fe077.svg} (86%) rename dev/common_workflows/hyperparameter_tuning/notebook/{24fdd955.svg => 8e367541.svg} (85%) rename dev/extended_examples/MNIST/notebook/{f64fa35a.svg => becce672.svg} (85%) rename dev/extended_examples/MNIST/notebook/{8c12c35a.svg => f281e18f.svg} (85%) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 5c966cf..1a571f1 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-16T20:03:53","documenter_version":"1.4.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-16T20:04:41","documenter_version":"1.4.1"}} \ No newline at end of file diff --git a/dev/about/index.html b/dev/about/index.html index 815a1b1..f3e3437 100644 --- a/dev/about/index.html +++ b/dev/about/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/common_workflows/architecture_search/README/index.html b/dev/common_workflows/architecture_search/README/index.html index f5d5cb7..c77e961 100644 --- a/dev/common_workflows/architecture_search/README/index.html +++ b/dev/common_workflows/architecture_search/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/architecture_search/notebook/index.html b/dev/common_workflows/architecture_search/notebook/index.html index d74e34c..a3ee0ee 100644 --- a/dev/common_workflows/architecture_search/notebook/index.html +++ b/dev/common_workflows/architecture_search/notebook/index.html @@ -85,7 +85,7 @@ fit!(mach, verbosity = 0); fitted_params(mach).best_model
NeuralNetworkClassifier(
   builder = MLP(
-        hidden = (9, 5, 37), 
+        hidden = (21, 9, 41), 
         σ = NNlib.relu), 
   finaliser = NNlib.softmax, 
   optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8), 
@@ -101,4 +101,4 @@
     mlp = [x[:model].builder for x in history],
     measurement = [x[:measurement][1] for x in history],
 )
-first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (9, 5, 37), …)0.0702663
2MLP(hidden = (9, 61, 25), …)0.0741193
3MLP(hidden = (61, 17, 45), …)0.0825228
4MLP(hidden = (9, 9, 53), …)0.0880892
5MLP(hidden = (57, 45, 21), …)0.089243
6MLP(hidden = (29, 57, 41), …)0.0900148
7MLP(hidden = (33, 9, 41), …)0.094931
8MLP(hidden = (53, 41, 61), …)0.0965709
9MLP(hidden = (25, 41, 45), …)0.0971129
10MLP(hidden = (21, 49, 5), …)0.0991543

This page was generated using Literate.jl.

+first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (21, 9, 41), …)0.0817329
2MLP(hidden = (49, 9, 57), …)0.0833506
3MLP(hidden = (57, 45, 21), …)0.089243
4MLP(hidden = (29, 57, 49), …)0.0927865
5MLP(hidden = (57, 49, 29), …)0.0936066
6MLP(hidden = (21, 13, 25), …)0.0967424
7MLP(hidden = (25, 25, 45), …)0.0969887
8MLP(hidden = (45, 53, 29), …)0.0991047
9MLP(hidden = (57, 61, 61), …)0.099232
10MLP(hidden = (29, 57, 13), …)0.107941

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/comparison/README/index.html b/dev/common_workflows/comparison/README/index.html index 8518961..f29be02 100644 --- a/dev/common_workflows/comparison/README/index.html +++ b/dev/common_workflows/comparison/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/comparison/notebook/index.html b/dev/common_workflows/comparison/notebook/index.html index 941eeb3..e1eb2b8 100644 --- a/dev/common_workflows/comparison/notebook/index.html +++ b/dev/common_workflows/comparison/notebook/index.html @@ -57,4 +57,4 @@ mlp = [x[:model] for x in history], measurement = [x[:measurement][1] for x in history], ) -sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.0857014
3RandomForestClassifier(max_depth = -1, …)0.106966
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

+sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.0857014
3RandomForestClassifier(max_depth = -1, …)0.106971
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

diff --git a/dev/common_workflows/composition/README/index.html b/dev/common_workflows/composition/README/index.html index de65957..3a00274 100644 --- a/dev/common_workflows/composition/README/index.html +++ b/dev/common_workflows/composition/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/composition/notebook/index.html b/dev/common_workflows/composition/notebook/index.html index fcd05ef..943a07a 100644 --- a/dev/common_workflows/composition/notebook/index.html +++ b/dev/common_workflows/composition/notebook/index.html @@ -65,4 +65,4 @@ ├────────────────────────────┼─────────┤ │ [1.0, 1.0, 0.95, 1.0, 1.0] │ 0.0219 │ └────────────────────────────┴─────────┘ -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/early_stopping/README/index.html b/dev/common_workflows/early_stopping/README/index.html index ff8aa61..139f7f0 100644 --- a/dev/common_workflows/early_stopping/README/index.html +++ b/dev/common_workflows/early_stopping/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/early_stopping/notebook/afe15004.svg b/dev/common_workflows/early_stopping/notebook/322fe077.svg similarity index 86% rename from dev/common_workflows/early_stopping/notebook/afe15004.svg rename to dev/common_workflows/early_stopping/notebook/322fe077.svg index 3e591a7..020aaba 100644 --- a/dev/common_workflows/early_stopping/notebook/afe15004.svg +++ b/dev/common_workflows/early_stopping/notebook/322fe077.svg @@ -1,48 +1,48 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/common_workflows/early_stopping/notebook/index.html b/dev/common_workflows/early_stopping/notebook/index.html index 8d2946f..9654a04 100644 --- a/dev/common_workflows/early_stopping/notebook/index.html +++ b/dev/common_workflows/early_stopping/notebook/index.html @@ -56,4 +56,4 @@ [ Info: final training loss: 0.045833383 [ Info: Stop triggered by EarlyStopping.NumberLimit(100) stopping criterion. [ Info: Total of 100 iterations.

Results

We can see that the model converged after 100 iterations.

plot(training_losses, label="Training Loss", linewidth=2)
-plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))
Example block output

This page was generated using Literate.jl.

+plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))Example block output

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/hyperparameter_tuning/README/index.html b/dev/common_workflows/hyperparameter_tuning/README/index.html index 8af3197..43f1879 100644 --- a/dev/common_workflows/hyperparameter_tuning/README/index.html +++ b/dev/common_workflows/hyperparameter_tuning/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/hyperparameter_tuning/notebook/24fdd955.svg b/dev/common_workflows/hyperparameter_tuning/notebook/8e367541.svg similarity index 85% rename from dev/common_workflows/hyperparameter_tuning/notebook/24fdd955.svg rename to dev/common_workflows/hyperparameter_tuning/notebook/8e367541.svg index 4c98f16..b5936ea 100644 --- a/dev/common_workflows/hyperparameter_tuning/notebook/24fdd955.svg +++ b/dev/common_workflows/hyperparameter_tuning/notebook/8e367541.svg @@ -1,40 +1,40 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/common_workflows/hyperparameter_tuning/notebook/index.html b/dev/common_workflows/hyperparameter_tuning/notebook/index.html index 4731cbb..0c0d37b 100644 --- a/dev/common_workflows/hyperparameter_tuning/notebook/index.html +++ b/dev/common_workflows/hyperparameter_tuning/notebook/index.html @@ -65,4 +65,4 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy", -)Example block output

This page was generated using Literate.jl.

+)Example block output

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/incremental_training/README/index.html b/dev/common_workflows/incremental_training/README/index.html index 8d0be80..a6ef53d 100644 --- a/dev/common_workflows/incremental_training/README/index.html +++ b/dev/common_workflows/incremental_training/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/incremental_training/notebook/index.html b/dev/common_workflows/incremental_training/notebook/index.html index 4cdfaba..3f3be2d 100644 --- a/dev/common_workflows/incremental_training/notebook/index.html +++ b/dev/common_workflows/incremental_training/notebook/index.html @@ -34,8 +34,8 @@ fit!(mach)
trained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)
   args: 
-    1:	Source @727 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
-    2:	Source @726 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
+    1:	Source @070 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
+    2:	Source @921 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
 

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.4392339631006042
val_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9

Poor performance it seems.

Incremental Training

Now let's train it for another 30 epochs at half the original learning rate. All we need to do is changes these hyperparameters and call fit again. It won't reset the model parameters before training.

clf.optimiser = Optimisers.Adam(clf.optimiser.eta/2)
 clf.epochs = clf.epochs + 30
 fit!(mach, verbosity=2);
[ Info: Updating machine(NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …).
@@ -68,4 +68,4 @@
 [ Info: Loss is 0.1353
 [ Info: Loss is 0.1251
 [ Info: Loss is 0.1173
-[ Info: Loss is 0.1102

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.10519664737051289
training_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9666666666666667

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

+[ Info: Loss is 0.1102

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.10519664737051289
training_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9666666666666667

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

diff --git a/dev/common_workflows/live_training/README/index.html b/dev/common_workflows/live_training/README/index.html index aae7037..01de518 100644 --- a/dev/common_workflows/live_training/README/index.html +++ b/dev/common_workflows/live_training/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/live_training/notebook/index.html b/dev/common_workflows/live_training/notebook/index.html index 3127e7a..40ae1dc 100644 --- a/dev/common_workflows/live_training/notebook/index.html +++ b/dev/common_workflows/live_training/notebook/index.html @@ -76,6 +76,6 @@ fit!(mach, force=true)
trained Machine; does not cache data
   model: ProbabilisticIteratedModel(model = NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …)
   args: 
-    1:	Source @904 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
-    2:	Source @128 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
-

This page was generated using Literate.jl.

+ 1: Source @981 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}} + 2: Source @483 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}} +

This page was generated using Literate.jl.

diff --git a/dev/contributing/index.html b/dev/contributing/index.html index 9f3d271..b64bb64 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -1,2 +1,2 @@ -Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

+Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

diff --git a/dev/extended_examples/Boston/index.html b/dev/extended_examples/Boston/index.html index 92eb740..9a6f971 100644 --- a/dev/extended_examples/Boston/index.html +++ b/dev/extended_examples/Boston/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/extended_examples/MNIST/README/index.html b/dev/extended_examples/MNIST/README/index.html index e78326d..7f5fd9d 100644 --- a/dev/extended_examples/MNIST/README/index.html +++ b/dev/extended_examples/MNIST/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/extended_examples/MNIST/notebook/f64fa35a.svg b/dev/extended_examples/MNIST/notebook/becce672.svg similarity index 85% rename from dev/extended_examples/MNIST/notebook/f64fa35a.svg rename to dev/extended_examples/MNIST/notebook/becce672.svg index 0394544..e874b4f 100644 --- a/dev/extended_examples/MNIST/notebook/f64fa35a.svg +++ b/dev/extended_examples/MNIST/notebook/becce672.svg @@ -1,50 +1,50 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/extended_examples/MNIST/notebook/8c12c35a.svg b/dev/extended_examples/MNIST/notebook/f281e18f.svg similarity index 85% rename from dev/extended_examples/MNIST/notebook/8c12c35a.svg rename to dev/extended_examples/MNIST/notebook/f281e18f.svg index b80603a..e1525ab 100644 --- a/dev/extended_examples/MNIST/notebook/8c12c35a.svg +++ b/dev/extended_examples/MNIST/notebook/f281e18f.svg @@ -1,62 +1,62 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/extended_examples/MNIST/notebook/index.html b/dev/extended_examples/MNIST/notebook/index.html index d0f4934..024bb98 100644 --- a/dev/extended_examples/MNIST/notebook/index.html +++ b/dev/extended_examples/MNIST/notebook/index.html @@ -78,7 +78,7 @@ 0.055122323 0.057923194

Adding 20 more epochs:

clf.epochs = clf.epochs + 20
 fit!(mach, rows=train);
[ Info: Updating machine(ImageClassifier(builder = Main.MyConvBuilder(3, 16, 32, 32), …), …).
-
Optimising neural net:  10%[==>                      ]  ETA: 0:00:11
Optimising neural net:  14%[===>                     ]  ETA: 0:00:11
Optimising neural net:  19%[====>                    ]  ETA: 0:00:10
Optimising neural net:  24%[=====>                   ]  ETA: 0:00:10
Optimising neural net:  29%[=======>                 ]  ETA: 0:00:10
Optimising neural net:  33%[========>                ]  ETA: 0:00:09
Optimising neural net:  38%[=========>               ]  ETA: 0:00:08
Optimising neural net:  43%[==========>              ]  ETA: 0:00:08
Optimising neural net:  48%[===========>             ]  ETA: 0:00:07
Optimising neural net:  52%[=============>           ]  ETA: 0:00:07
Optimising neural net:  57%[==============>          ]  ETA: 0:00:06
Optimising neural net:  62%[===============>         ]  ETA: 0:00:05
Optimising neural net:  67%[================>        ]  ETA: 0:00:05
Optimising neural net:  71%[=================>       ]  ETA: 0:00:04
Optimising neural net:  76%[===================>     ]  ETA: 0:00:03
Optimising neural net:  81%[====================>    ]  ETA: 0:00:03
Optimising neural net:  86%[=====================>   ]  ETA: 0:00:02
Optimising neural net:  90%[======================>  ]  ETA: 0:00:01
Optimising neural net:  95%[=======================> ]  ETA: 0:00:01
Optimising neural net: 100%[=========================] Time: 0:00:13

Computing an out-of-sample estimate of the loss:

predicted_labels = predict(mach, rows=test);
+
Optimising neural net:  10%[==>                      ]  ETA: 0:00:07
Optimising neural net:  14%[===>                     ]  ETA: 0:00:08
Optimising neural net:  19%[====>                    ]  ETA: 0:00:08
Optimising neural net:  24%[=====>                   ]  ETA: 0:00:08
Optimising neural net:  29%[=======>                 ]  ETA: 0:00:07
Optimising neural net:  33%[========>                ]  ETA: 0:00:07
Optimising neural net:  38%[=========>               ]  ETA: 0:00:06
Optimising neural net:  43%[==========>              ]  ETA: 0:00:06
Optimising neural net:  48%[===========>             ]  ETA: 0:00:05
Optimising neural net:  52%[=============>           ]  ETA: 0:00:05
Optimising neural net:  57%[==============>          ]  ETA: 0:00:04
Optimising neural net:  62%[===============>         ]  ETA: 0:00:04
Optimising neural net:  67%[================>        ]  ETA: 0:00:04
Optimising neural net:  71%[=================>       ]  ETA: 0:00:03
Optimising neural net:  76%[===================>     ]  ETA: 0:00:03
Optimising neural net:  81%[====================>    ]  ETA: 0:00:02
Optimising neural net:  86%[=====================>   ]  ETA: 0:00:02
Optimising neural net:  90%[======================>  ]  ETA: 0:00:01
Optimising neural net:  95%[=======================> ]  ETA: 0:00:01
Optimising neural net: 100%[=========================] Time: 0:00:10

Computing an out-of-sample estimate of the loss:

predicted_labels = predict(mach, rows=test);
 cross_entropy(predicted_labels, labels[test])
0.4883231265583621

Or to fit and predict, in one line:

evaluate!(mach,
           resampling=Holdout(fraction_train=0.5),
           measure=cross_entropy,
@@ -183,7 +183,7 @@
     parameter_means2,
     title="Flux parameter mean weights",
     xlab = "epoch",
-)
Example block output

Note. The higher the number in the plot legend, the deeper the layer we are **weight-averaging.

savefig(joinpath(tempdir(), "weights.png"))
"/tmp/weights.png"

Retrieving a snapshot for a prediction:

mach2 = machine(joinpath(tempdir(), "mnist3.jls"))
+)
Example block output

Note. The higher the number in the plot legend, the deeper the layer we are **weight-averaging.

savefig(joinpath(tempdir(), "weights.png"))
"/tmp/weights.png"

Retrieving a snapshot for a prediction:

mach2 = machine(joinpath(tempdir(), "mnist3.jls"))
 predict_mode(mach2, images[501:503])
3-element CategoricalArrays.CategoricalArray{Int64,1,UInt32}:
  7
  9
@@ -197,4 +197,4 @@
     ylab = "cross entropy",
     label="out-of-sample",
 )
-plot!(epochs, training_losses, label="training")
Example block output

This page was generated using Literate.jl.

+plot!(epochs, training_losses, label="training")Example block output

This page was generated using Literate.jl.

diff --git a/dev/extended_examples/spam_detection/README/index.html b/dev/extended_examples/spam_detection/README/index.html index 7a917c6..ad5990e 100644 --- a/dev/extended_examples/spam_detection/README/index.html +++ b/dev/extended_examples/spam_detection/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/extended_examples/spam_detection/notebook/index.html b/dev/extended_examples/spam_detection/notebook/index.html index fadc8aa..d9762f4 100644 --- a/dev/extended_examples/spam_detection/notebook/index.html +++ b/dev/extended_examples/spam_detection/notebook/index.html @@ -92,13 +92,13 @@ mach = machine(clf, x_train_processed_equalized_fixed, y_train)
untrained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = GenericBuilder(apply = #15), …)
   args: 
-    1:	Source @416 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
-    2:	Source @503 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
+    1:	Source @159 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
+    2:	Source @609 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
 

Train the Model

fit!(mach)
trained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = GenericBuilder(apply = #15), …)
   args: 
-    1:	Source @416 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
-    2:	Source @503 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
+    1:	Source @159 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
+    2:	Source @609 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
 

Evaluate the Model

ŷ = predict_mode(mach, x_val_processed_equalized_fixed)
 balanced_accuracy(ŷ, y_val)
0.8840999384477648

Acceptable performance. Let's see some live examples:

using Random: Random;
 Random.seed!(99);
@@ -111,4 +111,4 @@
 z_encoded_equalized_fixed = coerce(z_encoded_equalized_fixed, Continuous)
 z_pred = predict_mode(mach, z_encoded_equalized_fixed)
 
-print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

+print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index e229132..48b8840 100644 --- a/dev/index.html +++ b/dev/index.html @@ -40,4 +40,4 @@ ├─────────────────────────────┼─────────┤ │ [1.0, 1.0, 0.967, 0.9, 1.0] │ 0.0426 │ └─────────────────────────────┴─────────┘ -

As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.

Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.

Basic idea: "builders" for data-dependent architecture

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

  • Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training

  • Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer

  • Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.

  • Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • Comparing your model with a non-deep learning models

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

+

As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.

Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.

Basic idea: "builders" for data-dependent architecture

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

  • Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training

  • Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer

  • Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.

  • Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • Comparing your model with a non-deep learning models

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

diff --git a/dev/interface/Builders/index.html b/dev/interface/Builders/index.html index c3b5968..e41edfc 100644 --- a/dev/interface/Builders/index.html +++ b/dev/interface/Builders/index.html @@ -1,5 +1,5 @@ -Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
+Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
 
 julia> nn = NeuralNetworkRegressor(builder = @builder(Chain(Dense(n_in, 64, relu),
                                                             Dense(64, 32, relu),
@@ -11,4 +11,4 @@
            Chain(front, Dense(d, n_out));
        end
 
-julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
+julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
diff --git a/dev/interface/Classification/index.html b/dev/interface/Classification/index.html index 1e148c9..dcb500b 100644 --- a/dev/interface/Classification/index.html +++ b/dev/interface/Classification/index.html @@ -20,7 +20,7 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy") -

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.

  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
+

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.

  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
 import Optimisers
 import RDatasets

First, we can load the data:

mtcars = RDatasets.dataset("datasets", "mtcars");
 y, X = unpack(mtcars, ==(:VS), in([:MPG, :Cyl, :Disp, :HP, :WT, :QSec]));

Note that y is a vector and X a table.

y = categorical(y) # classifier takes catogorical input
@@ -48,4 +48,4 @@
    xscale=curve.parameter_scale,
    ylab = "Cross Entropy",
 )
-

See also ImageClassifier.

source
+

See also ImageClassifier.

source diff --git a/dev/interface/Custom Builders/index.html b/dev/interface/Custom Builders/index.html index d4968dd..1efdd9d 100644 --- a/dev/interface/Custom Builders/index.html +++ b/dev/interface/Custom Builders/index.html @@ -12,4 +12,4 @@ Dense(nn.n2, n_out, init=init), ) end

Note here that n_in and n_out depend on the size of the data (see Table 1).

For a concrete image classification example, see Using MLJ to classifiy the MNIST image dataset.

More generally, defining a new builder means defining a new struct sub-typing MLJFlux.Builder and defining a new MLJFlux.build method with one of these signatures:

MLJFlux.build(builder::MyBuilder, rng, n_in, n_out)
-MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

  • chain(x) must make sense:

    • for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
    • for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
  • The object returned by chain(x) must be an AbstractFloat vector of length n_out.

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
+MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

  • chain(x) must make sense:

    • for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
    • for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
  • The object returned by chain(x) must be an AbstractFloat vector of length n_out.

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
diff --git a/dev/interface/Image Classification/index.html b/dev/interface/Image Classification/index.html index eea63f1..f7bd771 100644 --- a/dev/interface/Image Classification/index.html +++ b/dev/interface/Image Classification/index.html @@ -46,4 +46,4 @@ resampling=Holdout(fraction_train=0.5), measure=cross_entropy, rows=1:1000, - verbosity=0)

See also NeuralNetworkClassifier.

source + verbosity=0)

See also NeuralNetworkClassifier.

source diff --git a/dev/interface/Multitarget Regression/index.html b/dev/interface/Multitarget Regression/index.html index bb5efe1..3cebf2c 100644 --- a/dev/interface/Multitarget Regression/index.html +++ b/dev/interface/Multitarget Regression/index.html @@ -25,4 +25,4 @@ # loss for `(Xtest, test)`: fit!(mach) # trains on all data `(X, y)` yhat = predict(mach, Xtest) -multi_loss(yhat, ytest)

See also NeuralNetworkRegressor

source +multi_loss(yhat, ytest)

See also NeuralNetworkRegressor

source diff --git a/dev/interface/Regression/index.html b/dev/interface/Regression/index.html index e88242a..6761641 100644 --- a/dev/interface/Regression/index.html +++ b/dev/interface/Regression/index.html @@ -43,4 +43,4 @@ # loss for `(Xtest, test)`: fit!(mach) # train on `(X, y)` yhat = predict(mach, Xtest) -l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source +l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source diff --git a/dev/interface/Summary/index.html b/dev/interface/Summary/index.html index 3799901..c392267 100644 --- a/dev/interface/Summary/index.html +++ b/dev/interface/Summary/index.html @@ -2,4 +2,4 @@ Summary · MLJFlux

Models

MLJFlux provides the model types below, for use with input features X and targets y of the scientific type indicated in the table below. The parameters n_in, n_out and n_channels refer to information passed to the builder, as described under Defining Custom Builders.

Model TypePrediction typescitype(X) <: _scitype(y) <: _
NeuralNetworkRegressorDeterministicAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Continuous) (n_out = 1)
MultitargetNeuralNetworkRegressorDeterministicAbstractMatrix{Continuous} or Table(Continuous) with n_in columns<: Table(Continuous) with n_out columns
NeuralNetworkClassifierProbabilisticAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Finite} with n_out classes
NeuralNetworkBinaryClassifierProbabilisticAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Finite{2}} (but n_out = 1)
ImageClassifierProbabilisticAbstractVector(<:Image{W,H}) with n_in = (W, H)AbstractVector{<:Finite} with n_out classes
What exactly is a "model"?

In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.

Difference in Definition

In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain objects, we call them chains, and restrict use of "model" to models in the MLJ sense.

Are oberservations rows or columns?

In MLJ the convention for two-dimensional data (tables and matrices) is rows=obervations. For matrices Flux has the opposite convention. If your data is a matrix with whose column index the observation index, then your optimal solution is to present the adjoint or transpose of your matrix to MLJFlux models. Otherwise, you can use the matrix as is, or transform one time with permutedims, and again present the adjoint or transpose as the optimal solution for MLJFlux training.

Instructions for coercing common image formats into some AbstractVector{<:Image} are here.

Fitting and warm restarts

MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach) will use a warm restart if:

  • only model.epochs has changed since the last call; or

  • only model.epochs or model.optimiser have changed since the last call and model.optimiser_changes_trigger_retraining == false (the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.

Here model=mach.model is the associated MLJ model.

The warm restart feature makes it possible to externally control iteration. See, for example, Early Stopping with MLJFlux and Using MLJ to classifiy the MNIST image dataset.

Model Hyperparameters.

All models share the following hyper-parameters. See individual model docstrings for a full list.

Hyper-parameterDescriptionDefault
builderDefault builder for models.MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers)
optimiserThe optimiser to use for training.Optimiser.Adam()
lossThe loss function used for training.Flux.mse (regressors) and Flux.crossentropy (classifiers)
n_epochsNumber of epochs to train for.10
batch_sizeThe batch size for the data.1
lambdaThe regularization strength. Range = [0, ∞).0
alphaThe L2/L1 mix of regularization. Range = [0, 1].0
rngThe random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a Xoshirio that is reset on every cold restart of model (machine) training.GLOBAL_RNG
accelerationUse CUDALibs() for training on GPU; default is CPU1().CPU1()
optimiser_changes_trigger_retrainingTrue if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes.false

The classifiers have an additional hyperparameter finaliser (default is Flux.softmax, or Flux.σ in the binary case) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). It should return a vector of the same length as its input.

Loss Functions

Currently, the loss function specified by loss=... is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy to one of the classifier constructors.

That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux.

More on accelerated training with GPUs

As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs(), as in

using MLJ
 ImageClassifier = @load ImageClassifier
 model = ImageClassifier(epochs=10, acceleration=CUDALibs())
-mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Builders

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ
MLJFlux.@builderMacro for customized builders
+mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Builders

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ
MLJFlux.@builderMacro for customized builders