One of the most important hyperparameters of deep learning models is the learning rate. It is a tuning parameter in an optimization function that determines the step size (how big or small) at each iteration while moving toward a mininum of a loss function.
Imagine you have a book, and you want to read it. The learning rate represents how fast you can read and absorb its content. If you read the book very quickly, you risk forgetting important parts and struggling to recall key details when you need to apply them. On the other hand, reading slowly allows you to study each concept thoroughly and understand it deeply, ensuring better retention. However, if you read too slowly, you might never finish the book. The goal is to find the right reading pace, or learning rate, that balances comprehension and efficiency. Reading too fast may result in superficial understanding, while reading too slowly might mean not acquiring knowledge quickly enough to meet your goals. By maintaining a moderate, balanced pace, you can maximize understanding and effectively apply what you've learned.
This analogy relates to training machine learning models. Training a model is like reading a book: you're trying to "learn" from the data. Applying that knowledge during testing or validation corresponds to validating the model. If you train the model too quickly (with a high learning rate), it may overfit, memorizing the training data without generalizing well to new data. If you train it too slowly (with a low learning rate), it may underfit, failing to learn enough patterns from the data. A balanced learning rate ensures the model acquires sufficient knowledge and performs well on both training and validation data.
We can experiement with different learning rates to find the optimal value where the model has best results. In order to try different learning rates, we should define a function to create a function first, for instance:
# Function to create model
def make_model(learning_rate=0.01):
base_model = Xception(weights='imagenet',
include_top=False,
input_shape=(150,150,3))
base_model.trainable = False
#########################################
inputs = keras.Input(shape=(150,150,3))
base = base_model(inputs, training=False)
vectors = keras.layers.GlobalAveragePooling2D()(base)
outputs = keras.layers.Dense(10)(vectors)
model = keras.Model(inputs, outputs)
#########################################
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
loss = keras.losses.CategoricalCrossentropy(from_logits=True)
# Compile the model
model.compile(optimizer=optimizer,
loss=loss,
metrics=['accuracy'])
return model
Next, we can loop over on the list of learning rates:
# Dictionary to store history with different learning rates
scores = {}
# List of learning rates
lrs = [0.0001, 0.001, 0.01, 0.1]
for lr in lrs:
print(lr)
model = make_model(learning_rate=lr)
history = model.fit(train_ds, epochs=10, validation_data=val_ds)
scores[lr] = history.history
print()
print()
Visualizing the training and validation accuracies help us to determine which learning rate value is the best for for the model. One typical way to determine the best value is by looking at the gap between training and validation accuracy. The smaller gap indicates the optimal value of the learning rate.
Add notes from the video (PRs are welcome)
- learning rate analogy: the speed of reading a book
- reading fast (skimming thus missing details) vs reading slow (not much progress and leaving out books)
- finding the optimal learning rate
The notes are written by the community. If you see an error here, please create a PR with a fix. |