diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json
index 1a571f18..bb45f33c 100644
--- a/dev/.documenter-siteinfo.json
+++ b/dev/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-16T20:04:41","documenter_version":"1.4.1"}}
\ No newline at end of file
+{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-03T20:14:37","documenter_version":"1.5.0"}}
\ No newline at end of file
diff --git a/dev/about/index.html b/dev/about/index.html
index f3e3437f..c62d9281 100644
--- a/dev/about/index.html
+++ b/dev/about/index.html
@@ -1,2 +1,2 @@
-
This demonstration is available as a Jupyter notebook or julia script here.
Neural Architecture Search (NAS) is an instance of hyperparameter tuning concerned with tuning model hyperparameters defining the architecture itself. Although it's typically performed with sophisticated search algorithms for efficiency, in this example we will be using a simple random search.
This demonstration is available as a Jupyter notebook or julia script here.
Neural Architecture Search (NAS) is an instance of hyperparameter tuning concerned with tuning model hyperparameters defining the architecture itself. Although it's typically performed with sophisticated search algorithms for efficiency, in this example we will be using a simple random search.
using MLJ # Has MLJFlux models
using Flux # For more flexibility
using RDatasets: RDatasets # Dataset source
using DataFrames # To view tuning results in a table
@@ -85,7 +85,7 @@
fit!(mach, verbosity = 0);
fitted_params(mach).best_model
NeuralNetworkClassifier(
builder = MLP(
- hidden = (21, 9, 41),
+ hidden = (9, 25, 13),
σ = NNlib.relu),
finaliser = NNlib.softmax,
optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8),
@@ -101,4 +101,4 @@
mlp = [x[:model].builder for x in history],
measurement = [x[:measurement][1] for x in history],
)
-first(sort!(history_df, [order(:measurement)]), 10)
using MLJ # Has MLJFlux models
using Flux # For more flexibility
import RDatasets # Dataset source
using DataFrames # To visualize hyperparameter search results
@@ -52,9 +52,9 @@
│ The input will be converted, but any earlier layers may be very slow.
│ layer = Dense(4 => 5, relu) # 25 parameters
│ summary(x) = "4×8 Matrix{Float64}"
-└ @ Flux ~/.julia/packages/Flux/Wz6D4/src/layers/stateless.jl:60
Now let's see the history for more details on the performance for each of the models
history = report(mach).history
+└ @ Flux ~/.julia/packages/Flux/CUn7U/src/layers/stateless.jl:60
Now let's see the history for more details on the performance for each of the models
history = report(mach).history
history_df = DataFrame(
mlp = [x[:model] for x in history],
measurement = [x[:measurement][1] for x in history],
)
-sort!(history_df, [order(:measurement)])
This demonstration is available as a Jupyter notebook or julia script here.
In this workflow example, we see how MLJFlux enables composing MLJ models with MLJFlux models. We will assume a class imbalance setting and wrap an oversampler with a deep learning model from MLJFlux.
This demonstration is available as a Jupyter notebook or julia script here.
In this workflow example, we see how MLJFlux enables composing MLJ models with MLJFlux models. We will assume a class imbalance setting and wrap an oversampler with a deep learning model from MLJFlux.
using MLJ # Has MLJFlux models
using Flux # For more flexibility
import RDatasets # Dataset source
using Plots # To visualize training
@@ -56,4 +56,4 @@
[ Info: final training loss: 0.045833383
[ Info: Stop triggered by EarlyStopping.NumberLimit(100) stopping criterion.
[ Info: Total of 100 iterations.
using MLJ # Has MLJFlux models
using Flux # For more flexibility
import RDatasets # Dataset source
using Plots # To plot tuning results
@@ -65,4 +65,4 @@
xlab=curve.parameter_name,
xscale=curve.parameter_scale,
ylab = "Cross Entropy",
-)
using MLJ # Has MLJFlux models
using Flux # For more flexibility
import RDatasets # Dataset source
import Optimisers # native Flux.jl optimisers no longer supported
Now let's train it for another 30 epochs at half the original learning rate. All we need to do is changes these hyperparameters and call fit again. It won't reset the model parameters before training.
[ Info: Updating machine(NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …).
@@ -68,4 +68,4 @@
[ Info: Loss is 0.1353
[ Info: Loss is 0.1251
[ Info: Loss is 0.1173
-[ Info: Loss is 0.1102
Let's evaluate the training loss and validation accuracy
This section assumes familiarity with the MLJ model API
If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.
One still needs to implement a new predict method.
Settings
This document was generated with Documenter.jl version 1.4.1 on Sunday 16 June 2024. Using Julia version 1.10.4.
This section assumes familiarity with the MLJ model API
If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.
One still needs to implement a new predict method.
Settings
This document was generated with Documenter.jl version 1.5.0 on Wednesday 3 July 2024. Using Julia version 1.10.4.
This demonstration is available as a Jupyter notebook or julia script here.
In this demo we use a custom RNN model from Flux with MLJFlux to classify text messages as spam or ham. We will be using the SMS Collection Dataset from Kaggle.
Warning. This demo includes some non-idiomatic use of MLJ to allow use of the Flux.jl Embedding layer. It is not recommended for MLJ beginners.
This demonstration is available as a Jupyter notebook or julia script here.
In this demo we use a custom RNN model from Flux with MLJFlux to classify text messages as spam or ham. We will be using the SMS Collection Dataset from Kaggle.
Warning. This demo includes some non-idiomatic use of MLJ to allow use of the Flux.jl Embedding layer. It is not recommended for MLJ beginners.
Acceptable performance. Let's see some live examples:
using Random: Random;
Random.seed!(99);
@@ -111,4 +111,4 @@
z_encoded_equalized_fixed = coerce(z_encoded_equalized_fixed, Continuous)
z_pred = predict_mode(mach, z_encoded_equalized_fixed)
-print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`
This document was generated with Documenter.jl version 1.4.1 on Sunday 16 June 2024. Using Julia version 1.10.4.
+print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`
Provide a user-friendly and high-level interface to fundamental Flux deep learning models while still being extensible by supporting custom models written with Flux
Make building deep learning models more convenient to users already familiar with the MLJ workflow
Make it easier to apply machine learning techniques provided by MLJ, including: out-of-sample performance evaluation, hyper-parameter optimization, iteration control, and more, to deep learning models
MLJFlux Scope
MLJFlux support is focused on fundamental deep learning models for common supervised learning tasks. Sophisticated architectures and approaches, such as online learning, reinforcement learning, and adversarial networks, are currently outside its scope. Also, MLJFlux is limited to tasks where all (batches of) training data fits into memory.
Provide a user-friendly and high-level interface to fundamental Flux deep learning models while still being extensible by supporting custom models written with Flux
Make building deep learning models more convenient to users already familiar with the MLJ workflow
Make it easier to apply machine learning techniques provided by MLJ, including: out-of-sample performance evaluation, hyper-parameter optimization, iteration control, and more, to deep learning models
MLJFlux Scope
MLJFlux support is focused on fundamental deep learning models for common supervised learning tasks. Sophisticated architectures and approaches, such as online learning, reinforcement learning, and adversarial networks, are currently outside its scope. Also, MLJFlux is limited to tasks where all (batches of) training data fits into memory.
You only need Flux if you need to build a custom architecture, or experiment with different loss or activation functions. Since MLJFlux 0.5, you must use optimisers from Optimisers.jl, as native Flux.jl optimisers are no longer supported.
As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.
Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.
As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.
Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:
Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training
Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer
Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.
Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)
Comparing your model with a non-deep learning models
A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.
Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.
Settings
This document was generated with Documenter.jl version 1.4.1 on Sunday 16 June 2024. Using Julia version 1.10.4.
+
As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.
Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.
As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.
Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:
Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training
Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer
Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.
Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)
Comparing your model with a non-deep learning models
A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.
Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.
Settings
This document was generated with Documenter.jl version 1.5.0 on Wednesday 3 July 2024. Using Julia version 1.10.4.
MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.
MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.
Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.
MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.
Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.
Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.
MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.
MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.
Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.
MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.
Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.
Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.
Do model = NeuralNetworkClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkClassifier(builder=...).
NeuralNetworkClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a Multiclass or OrderedFactor target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any AbstractVector whose element scitype is Multiclass or OrderedFactor; check the scitype with scitype(y)
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.crossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:
Flux.crossentropy: Standard multiclass classification loss, also known as the log loss.
Flux.logitcrossentopy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with softmax and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default softmax finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).
Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.
Flux.focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.
Currently MLJ measures are not supported values of loss.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights.] Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
finaliser=Flux.softmax: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.softmax.
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.
predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.
Do model = NeuralNetworkClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkClassifier(builder=...).
NeuralNetworkClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a Multiclass or OrderedFactor target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any AbstractVector whose element scitype is Multiclass or OrderedFactor; check the scitype with scitype(y)
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.crossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:
Flux.crossentropy: Standard multiclass classification loss, also known as the log loss.
Flux.logitcrossentopy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with softmax and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default softmax finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).
Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.
Flux.focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.
Currently MLJ measures are not supported values of loss.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights.] Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
finaliser=Flux.softmax: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.softmax.
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.
predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.
using MLJ
using Flux
import RDatasets
import Optimisers
Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).
NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.
optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:
Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.
Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).
Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.
Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.
Currently MLJ measures are not supported values of loss.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.
predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.
Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).
NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.
optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:
Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.
Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).
Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.
Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.
Currently MLJ measures are not supported values of loss.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.
predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.
using MLJ, Flux
import Optimisers
import RDatasets
First, we can load the data:
mtcars = RDatasets.dataset("datasets", "mtcars");
y, X = unpack(mtcars, ==(:VS), in([:MPG, :Cyl, :Disp, :HP, :WT, :QSec]));
Following is an example defining a new builder for creating a simple fully-connected neural network with two hidden layers, with n1 nodes in the first hidden layer, and n2 nodes in the second, for use in any of the first three models in Table 1. The definition includes one mutable struct and one method:
Following is an example defining a new builder for creating a simple fully-connected neural network with two hidden layers, with n1 nodes in the first hidden layer, and n2 nodes in the second, for use in any of the first three models in Table 1. The definition includes one mutable struct and one method:
mutable struct MyBuilder <: MLJFlux.Builder
n1 :: Int
n2 :: Int
end
@@ -12,4 +12,4 @@
Dense(nn.n2, n_out, init=init),
)
end
Note here that n_in and n_out depend on the size of the data (see Table 1).
More generally, defining a new builder means defining a new struct sub-typing MLJFlux.Builder and defining a new MLJFlux.build method with one of these signatures:
MLJFlux.build(builder::MyBuilder, rng, n_in, n_out)
-MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`
This method must return a Flux.Chain instance, chain, subject to the following conditions:
chain(x) must make sense:
for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
The object returned by chain(x) must be an AbstractFloat vector of length n_out.
Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,
This document was generated with Documenter.jl version 1.4.1 on Sunday 16 June 2024. Using Julia version 1.10.4.
+MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`
This method must return a Flux.Chain instance, chain, subject to the following conditions:
chain(x) must make sense:
for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
The object returned by chain(x) must be an AbstractFloat vector of length n_out.
Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,
Do model = ImageClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ImageClassifier(builder=...).
ImageClassifier classifies images using a neural network adapted to the type of images provided (color or gray scale). Predictions are probabilistic. Users provide a recipe for constructing the network, based on properties of the image encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is any AbstractVector of images with ColorImage or GrayImage scitype; check the scitype with scitype(X) and refer to ScientificTypes.jl documentation on coercing typical image formats into an appropriate type.
y is the target, which can be any AbstractVector whose element scitype is Multiclass; check the scitype with scitype(y).
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder: An MLJFlux builder that constructs the neural network. The fallback builds a depth-16 VGG architecture adapted to the image size and number of target classes, with no batch normalization; see the Metalhead.jl documentation for details. See the example below for a user-specified builder. A convenience macro @builder is also available. See also finaliser below.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.crossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:
Flux.crossentropy: Standard multiclass classification loss, also known as the log loss.
Flux.logitcrossentopy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with softmax and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default softmax finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).
Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.
Flux.focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.
Currently MLJ measures are not supported values of loss.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and
Increassing batch size may accelerate training if acceleration=CUDALibs() and a
GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
finaliser=Flux.softmax: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.softmax.
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.
predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we use MLJFlux and a custom builder to classify the MNIST image dataset.
Do model = ImageClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ImageClassifier(builder=...).
ImageClassifier classifies images using a neural network adapted to the type of images provided (color or gray scale). Predictions are probabilistic. Users provide a recipe for constructing the network, based on properties of the image encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is any AbstractVector of images with ColorImage or GrayImage scitype; check the scitype with scitype(X) and refer to ScientificTypes.jl documentation on coercing typical image formats into an appropriate type.
y is the target, which can be any AbstractVector whose element scitype is Multiclass; check the scitype with scitype(y).
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder: An MLJFlux builder that constructs the neural network. The fallback builds a depth-16 VGG architecture adapted to the image size and number of target classes, with no batch normalization; see the Metalhead.jl documentation for details. See the example below for a user-specified builder. A convenience macro @builder is also available. See also finaliser below.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.crossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:
Flux.crossentropy: Standard multiclass classification loss, also known as the log loss.
Flux.logitcrossentopy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with softmax and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default softmax finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).
Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.
Flux.focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.
Currently MLJ measures are not supported values of loss.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and
Increassing batch size may accelerate training if acceleration=CUDALibs() and a
GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
finaliser=Flux.softmax: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.softmax.
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.
predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we use MLJFlux and a custom builder to classify the MNIST image dataset.
using MLJ
using Flux
import MLJFlux
import Optimisers
@@ -46,4 +46,4 @@
resampling=Holdout(fraction_train=0.5),
measure=cross_entropy,
rows=1:1000,
- verbosity=0)
Do model = MultitargetNeuralNetworkRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetNeuralNetworkRegressor(builder=...).
MultitargetNeuralNetworkRegressor is for training a data-dependent Flux.jl neural network to predict a multi-valued Continuous target, represented as a table, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any table or matrix of output targets whose element scitype is Continuous; check column scitypes with schema(y). If y is a Matrix, it is assumed to have columns corresponding to variables and rows corresponding to observations.
Hyper-parameters
builder=MLJFlux.Linear(σ=Flux.relu): An MLJFlux builder that constructs a neural network. Possible builders include: Linear, Short, and MLP. See MLJFlux documentation for more on builders, and the example below for using the @builder convenience macro.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.mse: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:
Flux.mse
Flux.mae
Flux.msle
Flux.huber_loss
Currently MLJ measures are not supported as loss functions here.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew having the same scitype as X above. Predictions are deterministic.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we apply a multi-target regression model to synthetic data:
Do model = MultitargetNeuralNetworkRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetNeuralNetworkRegressor(builder=...).
MultitargetNeuralNetworkRegressor is for training a data-dependent Flux.jl neural network to predict a multi-valued Continuous target, represented as a table, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any table or matrix of output targets whose element scitype is Continuous; check column scitypes with schema(y). If y is a Matrix, it is assumed to have columns corresponding to variables and rows corresponding to observations.
Hyper-parameters
builder=MLJFlux.Linear(σ=Flux.relu): An MLJFlux builder that constructs a neural network. Possible builders include: Linear, Short, and MLP. See MLJFlux documentation for more on builders, and the example below for using the @builder convenience macro.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.mse: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:
Flux.mse
Flux.mae
Flux.msle
Flux.huber_loss
Currently MLJ measures are not supported as loss functions here.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew having the same scitype as X above. Predictions are deterministic.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we apply a multi-target regression model to synthetic data:
using MLJ
import MLJFlux
using Flux
import Optimisers
First, we generate some synthetic data (needs MLJBase 0.20.16 or higher):
X, y = make_regression(100, 9; n_targets = 2) # both tables
@@ -25,4 +25,4 @@
# loss for `(Xtest, test)`:
fit!(mach) # trains on all data `(X, y)`
yhat = predict(mach, Xtest)
-multi_loss(yhat, ytest)
Do model = NeuralNetworkRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkRegressor(builder=...).
NeuralNetworkRegressor is for training a data-dependent Flux.jl neural network to predict a Continuous target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any AbstractVector whose element scitype is Continuous; check the scitype with scitype(y)
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder=MLJFlux.Linear(σ=Flux.relu): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux documentation for more on builders, and the example below for using the @builder convenience macro.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.mse: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:
Flux.mse
Flux.mae
Flux.msle
Flux.huber_loss
Currently MLJ measures are not supported as loss functions here.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increasing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalized if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we build a regression model for the Boston house price dataset.
Do model = NeuralNetworkRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkRegressor(builder=...).
NeuralNetworkRegressor is for training a data-dependent Flux.jl neural network to predict a Continuous target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)
Here:
X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
y is the target, which can be any AbstractVector whose element scitype is Continuous; check the scitype with scitype(y)
Train the machine with fit!(mach, rows=...).
Hyper-parameters
builder=MLJFlux.Linear(σ=Flux.relu): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux documentation for more on builders, and the example below for using the @builder convenience macro.
optimiser::Optimisers.Adam(): An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.
loss=Flux.mse: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:
Flux.mse
Flux.mae
Flux.msle
Flux.huber_loss
Currently MLJ measures are not supported as loss functions here.
epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.
batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increasing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.
lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞). Note the history reports unpenalized losses.
alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.
rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training. The default is Random.default_rng().
optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.
acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().
Operations
predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above.
Fitted parameters
The fields of fitted_params(mach) are:
chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.
Report
The fields of report(mach) are:
training_losses: A vector of training losses (penalized if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.
Examples
In this example we build a regression model for the Boston house price dataset.
using MLJ
import MLJFlux
using Flux
import Optimisers
First, we load in the data: The :MEDV column becomes the target vector y, and all remaining columns go into a table X, with the exception of :CHAS:
data = OpenML.load(531); # Loads from https://www.openml.org/d/531
@@ -43,4 +43,4 @@
# loss for `(Xtest, test)`:
fit!(mach) # train on `(X, y)`
yhat = predict(mach, Xtest)
-l2(yhat, ytest)
These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.
For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.
MLJFlux provides the model types below, for use with input features X and targets y of the scientific type indicated in the table below. The parameters n_in, n_out and n_channels refer to information passed to the builder, as described under Defining Custom Builders.
In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.
Difference in Definition
In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain objects, we call them chains, and restrict use of "model" to models in the MLJ sense.
Are oberservations rows or columns?
In MLJ the convention for two-dimensional data (tables and matrices) is rows=obervations. For matrices Flux has the opposite convention. If your data is a matrix with whose column index the observation index, then your optimal solution is to present the adjoint or transpose of your matrix to MLJFlux models. Otherwise, you can use the matrix as is, or transform one time with permutedims, and again present the adjoint or transpose as the optimal solution for MLJFlux training.
Instructions for coercing common image formats into some AbstractVector{<:Image} are here.
Fitting and warm restarts
MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach) will use a warm restart if:
only model.epochs has changed since the last call; or
only model.epochs or model.optimiser have changed since the last call and model.optimiser_changes_trigger_retraining == false (the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.
Here model=mach.model is the associated MLJ model.
All models share the following hyper-parameters. See individual model docstrings for a full list.
Hyper-parameter
Description
Default
builder
Default builder for models.
MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers)
optimiser
The optimiser to use for training.
Optimiser.Adam()
loss
The loss function used for training.
Flux.mse (regressors) and Flux.crossentropy (classifiers)
n_epochs
Number of epochs to train for.
10
batch_size
The batch size for the data.
1
lambda
The regularization strength. Range = [0, ∞).
0
alpha
The L2/L1 mix of regularization. Range = [0, 1].
0
rng
The random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a Xoshirio that is reset on every cold restart of model (machine) training.
GLOBAL_RNG
acceleration
Use CUDALibs() for training on GPU; default is CPU1().
CPU1()
optimiser_changes_trigger_retraining
True if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes.
false
The classifiers have an additional hyperparameter finaliser (default is Flux.softmax, or Flux.σ in the binary case) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). It should return a vector of the same length as its input.
Loss Functions
Currently, the loss function specified by loss=... is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy to one of the classifier constructors.
That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux.
More on accelerated training with GPUs
As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs(), as in
MLJFlux provides the model types below, for use with input features X and targets y of the scientific type indicated in the table below. The parameters n_in, n_out and n_channels refer to information passed to the builder, as described under Defining Custom Builders.
In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.
Difference in Definition
In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain objects, we call them chains, and restrict use of "model" to models in the MLJ sense.
Are oberservations rows or columns?
In MLJ the convention for two-dimensional data (tables and matrices) is rows=obervations. For matrices Flux has the opposite convention. If your data is a matrix with whose column index the observation index, then your optimal solution is to present the adjoint or transpose of your matrix to MLJFlux models. Otherwise, you can use the matrix as is, or transform one time with permutedims, and again present the adjoint or transpose as the optimal solution for MLJFlux training.
Instructions for coercing common image formats into some AbstractVector{<:Image} are here.
Fitting and warm restarts
MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach) will use a warm restart if:
only model.epochs has changed since the last call; or
only model.epochs or model.optimiser have changed since the last call and model.optimiser_changes_trigger_retraining == false (the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.
Here model=mach.model is the associated MLJ model.
All models share the following hyper-parameters. See individual model docstrings for a full list.
Hyper-parameter
Description
Default
builder
Default builder for models.
MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers)
optimiser
The optimiser to use for training.
Optimiser.Adam()
loss
The loss function used for training.
Flux.mse (regressors) and Flux.crossentropy (classifiers)
n_epochs
Number of epochs to train for.
10
batch_size
The batch size for the data.
1
lambda
The regularization strength. Range = [0, ∞).
0
alpha
The L2/L1 mix of regularization. Range = [0, 1].
0
rng
The random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a Xoshirio that is reset on every cold restart of model (machine) training.
GLOBAL_RNG
acceleration
Use CUDALibs() for training on GPU; default is CPU1().
CPU1()
optimiser_changes_trigger_retraining
True if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes.
false
The classifiers have an additional hyperparameter finaliser (default is Flux.softmax, or Flux.σ in the binary case) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). It should return a vector of the same length as its input.
Loss Functions
Currently, the loss function specified by loss=... is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy to one of the classifier constructors.
That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux.
More on accelerated training with GPUs
As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs(), as in
using MLJ
ImageClassifier = @load ImageClassifier
model = ImageClassifier(epochs=10, acceleration=CUDALibs())
-mach = machine(model, X, y) |> fit!
In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).
This document was generated with Documenter.jl version 1.4.1 on Sunday 16 June 2024. Using Julia version 1.10.4.
+mach = machine(model, X, y) |> fit!
In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).