From 93ea65e26b94198e866f6cd5a50983770a9a7f69 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Tue, 11 Jun 2024 00:39:35 +0000 Subject: [PATCH] build based on 06282cd --- dev/.documenter-siteinfo.json | 2 +- dev/about/index.html | 2 +- dev/contributing/index.html | 2 +- dev/full tutorials/Boston/index.html | 2 +- dev/full tutorials/MNIST/index.html | 2 +- dev/full tutorials/Spam Detection with RNNs/SMS/index.html | 2 +- dev/index.html | 2 +- dev/interface/Builders/index.html | 4 ++-- dev/interface/Classification/index.html | 4 ++-- dev/interface/Custom Builders/index.html | 2 +- dev/interface/Image Classification/index.html | 2 +- dev/interface/Multitarget Regression/index.html | 2 +- dev/interface/Regression/index.html | 2 +- dev/interface/Summary/index.html | 2 +- .../Basic Neural Architecture Search/tuning/index.html | 2 +- dev/workflow examples/Comparison/comparison/index.html | 2 +- dev/workflow examples/Composition/composition/index.html | 2 +- dev/workflow examples/Early Stopping/iteration/index.html | 2 +- dev/workflow examples/Hyperparameter Tuning/tuning/index.html | 2 +- .../Incremental Training/incremental/index.html | 2 +- dev/workflow examples/Live Training/live-training/index.html | 2 +- 21 files changed, 23 insertions(+), 23 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 9786404a..d8e62131 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-10T23:23:58","documenter_version":"1.4.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-11T00:39:31","documenter_version":"1.4.1"}} \ No newline at end of file diff --git a/dev/about/index.html b/dev/about/index.html index c66c2b67..642e23be 100644 --- a/dev/about/index.html +++ b/dev/about/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/contributing/index.html b/dev/contributing/index.html index 7a91ef7e..8f182f03 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -1,2 +1,2 @@ -Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

+Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

diff --git a/dev/full tutorials/Boston/index.html b/dev/full tutorials/Boston/index.html index 9ba6a4ca..a0c7708f 100644 --- a/dev/full tutorials/Boston/index.html +++ b/dev/full tutorials/Boston/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/full tutorials/MNIST/index.html b/dev/full tutorials/MNIST/index.html index 00f99ee9..1b9020ae 100644 --- a/dev/full tutorials/MNIST/index.html +++ b/dev/full tutorials/MNIST/index.html @@ -52,4 +52,4 @@ mach, resampling=Holdout(rng=123, fraction_train=0.7), measure=misclassification_rate, - ) + ) diff --git a/dev/full tutorials/Spam Detection with RNNs/SMS/index.html b/dev/full tutorials/Spam Detection with RNNs/SMS/index.html index 1ed537f3..a9a02a4f 100644 --- a/dev/full tutorials/Spam Detection with RNNs/SMS/index.html +++ b/dev/full tutorials/Spam Detection with RNNs/SMS/index.html @@ -119,4 +119,4 @@ z_encoded_equalized_fixed = coerce(z_encoded_equalized_fixed, Continuous) z_pred = predict_mode(mach, z_encoded_equalized_fixed) -print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

+print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index fa862c8f..b4d04ccb 100644 --- a/dev/index.html +++ b/dev/index.html @@ -23,4 +23,4 @@ # 4. Evaluate the model cv=CV(nfolds=5) -evaluate!(mach, resampling=cv, measure=accuracy)

As you can see we were able to use MLJ functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided also have defaults.

Notice that we were also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux was able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to implement a training or prediction loop as in Flux.

Basic idea

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results, and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

+evaluate!(mach, resampling=cv, measure=accuracy)

As you can see we were able to use MLJ functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided also have defaults.

Notice that we were also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux was able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to implement a training or prediction loop as in Flux.

Basic idea

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results, and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

diff --git a/dev/interface/Builders/index.html b/dev/interface/Builders/index.html index 508ec092..f725e9b6 100644 --- a/dev/interface/Builders/index.html +++ b/dev/interface/Builders/index.html @@ -1,5 +1,5 @@ -Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
+Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
 
 julia> nn = NeuralNetworkRegressor(builder = @builder(Chain(Dense(n_in, 64, relu),
                                                             Dense(64, 32, relu),
@@ -11,4 +11,4 @@
            Chain(front, Dense(d, n_out));
        end
 
-julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
+julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
diff --git a/dev/interface/Classification/index.html b/dev/interface/Classification/index.html index dc267fdc..74ba3b80 100644 --- a/dev/interface/Classification/index.html +++ b/dev/interface/Classification/index.html @@ -19,7 +19,7 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy") -

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on unknown.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=unknown

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.

  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and

    1. Increassing batch size may accelerate training if acceleration=CUDALibs() and a

    GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
+

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on unknown.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=unknown

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.

  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and

    1. Increassing batch size may accelerate training if acceleration=CUDALibs() and a

    GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
 import Optimisers
 import RDatasets

First, we can load the data:

mtcars = RDatasets.dataset("datasets", "mtcars");
 y, X = unpack(mtcars, ==(:VS), in([:MPG, :Cyl, :Disp, :HP, :WT, :QSec]));

Note that y is a vector and X a table.

y = categorical(y) # classifier takes catogorical input
@@ -47,4 +47,4 @@
    xscale=curve.parameter_scale,
    ylab = "Cross Entropy",
 )
-

See also ImageClassifier.

source
+

See also ImageClassifier.

source diff --git a/dev/interface/Custom Builders/index.html b/dev/interface/Custom Builders/index.html index 37726355..9e379744 100644 --- a/dev/interface/Custom Builders/index.html +++ b/dev/interface/Custom Builders/index.html @@ -12,4 +12,4 @@ Dense(nn.n2, n_out, init=init), ) end

Note here that n_in and n_out depend on the size of the data (see Table 1.

For a concrete image classification example, see the Image Classification Example.

More generally, defining a new builder means defining a new struct sub-typing MLJFlux.Builder and defining a new MLJFlux.build method with one of these signatures:

MLJFlux.build(builder::MyBuilder, rng, n_in, n_out)
-MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
+MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
diff --git a/dev/interface/Image Classification/index.html b/dev/interface/Image Classification/index.html index 6a256230..690ec2ee 100644 --- a/dev/interface/Image Classification/index.html +++ b/dev/interface/Image Classification/index.html @@ -45,4 +45,4 @@ resampling=Holdout(fraction_train=0.5), measure=cross_entropy, rows=1:1000, - verbosity=0)

See also NeuralNetworkClassifier.

source + verbosity=0)

See also NeuralNetworkClassifier.

source diff --git a/dev/interface/Multitarget Regression/index.html b/dev/interface/Multitarget Regression/index.html index 1a933ff3..17b774c1 100644 --- a/dev/interface/Multitarget Regression/index.html +++ b/dev/interface/Multitarget Regression/index.html @@ -24,4 +24,4 @@ # loss for `(Xtest, test)`: fit!(mach) # trains on all data `(X, y)` yhat = predict(mach, Xtest) -multi_loss(yhat, ytest)

See also NeuralNetworkRegressor

source +multi_loss(yhat, ytest)

See also NeuralNetworkRegressor

source diff --git a/dev/interface/Regression/index.html b/dev/interface/Regression/index.html index 727aedca..138508d3 100644 --- a/dev/interface/Regression/index.html +++ b/dev/interface/Regression/index.html @@ -42,4 +42,4 @@ # loss for `(Xtest, test)`: fit!(mach) # train on `(X, y)` yhat = predict(mach, Xtest) -l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source +l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source diff --git a/dev/interface/Summary/index.html b/dev/interface/Summary/index.html index 3c47cab2..c4812350 100644 --- a/dev/interface/Summary/index.html +++ b/dev/interface/Summary/index.html @@ -2,4 +2,4 @@ Summary · MLJFlux

Models

MLJFlux provides four model types, for use with input features X and targets y of the scientific type indicated in the table below. The parameters n_in, n_out and n_channels refer to information passed to the builder, as described under Defining a new builder below.

Model TypePrediction typescitype(X) <: _scitype(y) <: _
NeuralNetworkRegressorDeterministicTable(Continuous) with n_in columnsAbstractVector{<:Continuous) (n_out = 1)
MultitargetNeuralNetworkRegressorDeterministicTable(Continuous) with n_in columns<: Table(Continuous) with n_out columns
NeuralNetworkClassifierProbabilistic<:Table(Continuous) with n_in columnsAbstractVector{<:Finite} with n_out classes
NeuralNetworkBinaryClassifierProbabilistic<:Table(Continuous) with n_in columnsAbstractVector{<:Finite{2}} (n_out = 2)
ImageClassifierProbabilisticAbstractVector(<:Image{W,H}) with n_in = (W, H)AbstractVector{<:Finite} with n_out classes
See definition of "model"

In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.

Difference in Definition

In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain objects, we call them chains, and restrict use of "model" to models in the MLJ sense.

Dealing with non-tabular input

Any AbstractMatrix{<:AbstractFloat} object Xmat can be forced to have scitype Table(Continuous) by replacing it with X = MLJ.table(Xmat). Furthermore, this wrapping, and subsequent unwrapping under the hood, will compile to a no-op. At present this includes support for sparse matrix data, but the implementation has not been optimized for sparse data at this time and so should be used with caution.

Instructions for coercing common image formats into some AbstractVector{<:Image} are here.

Fitting and warm restarts

MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach) will use a warm restart if:

  • only model.epochs has changed since the last call; or

  • only model.epochs or model.optimiser have changed since the last call and model.optimiser_changes_trigger_retraining == false (the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.

Here model=mach.model is the associated MLJ model.

The warm restart feature makes it possible to apply early stopping criteria, as defined in EarlyStopping.jl. For an example, see /examples/mnist/. (Eventually, this will be handled by an MLJ model wrapper for controlling arbitrary iterative models.)

Model Hyperparameters.

All models share the following hyper-parameters:

Hyper-parameterDescriptionDefault
builderDefault builder for models.MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers)
optimiserThe optimiser to use for training.Flux.ADAM()
lossThe loss function used for training.Flux.mse (regressors) and Flux.crossentropy (classifiers)
n_epochsNumber of epochs to train for.10
batch_sizeThe batch size for the data.1
lambdaThe regularization strength. Range = [0, ∞).0
alphaThe L2/L1 mix of regularization. Range = [0, 1].0
rngThe random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a MersenneTwister that is reset on every cold restart of model (machine) training.GLOBAL_RNG
accelerationUse CUDALibs() for training on GPU; default is CPU1().CPU1()
optimiser_changes_trigger_retrainingTrue if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes.false

The classifiers have an additional hyperparameter finaliser (default = Flux.softmax) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). Default = Flux.softmax. It should return a vector of the same length as its input.

Loss Functions

Currently, the loss function specified by loss=... is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy to one of the classifier constructors.

That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux.

More on accelerated training with GPUs

As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs(), as in

using MLJ
 ImageClassifier = @load ImageClassifier
 model = ImageClassifier(epochs=10, acceleration=CUDALibs())
-mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Built-in builders

As for the builder argument, the following builders are provided out-of-the-box:

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ

See the following sections to learn more about the interface for the builders and models.

+mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Built-in builders

As for the builder argument, the following builders are provided out-of-the-box:

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ

See the following sections to learn more about the interface for the builders and models.

diff --git a/dev/workflow examples/Basic Neural Architecture Search/tuning/index.html b/dev/workflow examples/Basic Neural Architecture Search/tuning/index.html index 984da412..524767c4 100644 --- a/dev/workflow examples/Basic Neural Architecture Search/tuning/index.html +++ b/dev/workflow examples/Basic Neural Architecture Search/tuning/index.html @@ -70,4 +70,4 @@ mlp = [x[:model].builder for x in history], measurement = [x[:measurement][1] for x in history], ) -first(sort!(history_df, [order(:measurement)]), 10)

This page was generated using Literate.jl.

+first(sort!(history_df, [order(:measurement)]), 10)

This page was generated using Literate.jl.

diff --git a/dev/workflow examples/Comparison/comparison/index.html b/dev/workflow examples/Comparison/comparison/index.html index 3125f083..582a4488 100644 --- a/dev/workflow examples/Comparison/comparison/index.html +++ b/dev/workflow examples/Comparison/comparison/index.html @@ -55,4 +55,4 @@ └ @ Flux ~/.julia/packages/Flux/Wz6D4/src/layers/stateless.jl:60

Now let's see the history for more details on the performance for each of the models

history = report(mach).history
 history_df = DataFrame(mlp = [x[:model] for x in history], measurement = [x[:measurement][1] for x in history])
-sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2RandomForestClassifier(max_depth = -1, …)0.106565
3NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.113266
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

+sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2RandomForestClassifier(max_depth = -1, …)0.106565
3NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.113266
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

diff --git a/dev/workflow examples/Composition/composition/index.html b/dev/workflow examples/Composition/composition/index.html index 2989e5ab..d4a18a0d 100644 --- a/dev/workflow examples/Composition/composition/index.html +++ b/dev/workflow examples/Composition/composition/index.html @@ -24,4 +24,4 @@ standarizer = Standardizer()

Now let's compose the balanced model with a standardizer.

pipeline = standarizer |> balanced_model

By this, any training data will be standardized then oversampled then passed to the model. Meanwhile, for inference, the standardizer will automatically use the training set's mean and std and the oversampler will be transparent.

Training the Composed Model

It's indistinguishable from training a single model.

mach = machine(pipeline, X, y)
 fit!(mach)
 cv=CV(nfolds=5)
-evaluate!(mach, resampling=cv, measure=accuracy)

This page was generated using Literate.jl.

+evaluate!(mach, resampling=cv, measure=accuracy)

This page was generated using Literate.jl.

diff --git a/dev/workflow examples/Early Stopping/iteration/index.html b/dev/workflow examples/Early Stopping/iteration/index.html index e881e9e9..8c639f24 100644 --- a/dev/workflow examples/Early Stopping/iteration/index.html +++ b/dev/workflow examples/Early Stopping/iteration/index.html @@ -34,4 +34,4 @@ # We can get the training losses like so training_losses = report(mach)[:model_report].training_losses; nothing #hide

Results

We can see that the model converged after 100 iterations.

plot(training_losses, label="Training Loss", linewidth=2)
-plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))
using Literate #src

This page was generated using Literate.jl.

+plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))
using Literate #src

This page was generated using Literate.jl.

diff --git a/dev/workflow examples/Hyperparameter Tuning/tuning/index.html b/dev/workflow examples/Hyperparameter Tuning/tuning/index.html index 514c957c..e36762e7 100644 --- a/dev/workflow examples/Hyperparameter Tuning/tuning/index.html +++ b/dev/workflow examples/Hyperparameter Tuning/tuning/index.html @@ -30,4 +30,4 @@ curve.measurements, xlab=curve.parameter_name, xscale=curve.parameter_scale, - ylab = "Cross Entropy")

This page was generated using Literate.jl.

+ ylab = "Cross Entropy")

This page was generated using Literate.jl.

diff --git a/dev/workflow examples/Incremental Training/incremental/index.html b/dev/workflow examples/Incremental Training/incremental/index.html index 695b7da4..7f7e578d 100644 --- a/dev/workflow examples/Incremental Training/incremental/index.html +++ b/dev/workflow examples/Incremental Training/incremental/index.html @@ -19,4 +19,4 @@ fit!(mach)

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
val_acc = accuracy(predict_mode(mach, X_test), y_test)

Poor performance it seems.

Incremental Training

Now let's train it for another 30 epochs at half the original learning rate. All we need to do is changes these hyperparameters and call fit again. It won't reset the model parameters before training.

clf.optimiser.eta = clf.optimiser.eta / 2
 clf.epochs = clf.epochs + 30
 fit!(mach, verbosity=2);
-nothing #hide

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
training_acc = accuracy(predict_mode(mach, X_test), y_test)

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

+nothing #hide

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
training_acc = accuracy(predict_mode(mach, X_test), y_test)

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

diff --git a/dev/workflow examples/Live Training/live-training/index.html b/dev/workflow examples/Live Training/live-training/index.html index 5c508fd6..a6d43b67 100644 --- a/dev/workflow examples/Live Training/live-training/index.html +++ b/dev/workflow examples/Live Training/live-training/index.html @@ -35,4 +35,4 @@ controls=vcat(stop_conditions, callbacks), retrain=true # no need to retrain on all data at the end )

Live Training

Simply fitting the model is all we need

mach = machine(iterated_model, X, y)
-fit!(mach, force=true)
using Literate #src

This page was generated using Literate.jl.

+fit!(mach, force=true)
using Literate #src

This page was generated using Literate.jl.