From a9c52a7ea3170d26fac1b8a81b1579cbefc217d3 Mon Sep 17 00:00:00 2001 From: Lokesh Baviskar Date: Mon, 17 Apr 2023 12:57:30 +0530 Subject: [PATCH] Created using Colaboratory --- 01_pytorch_workflow.ipynb | 2418 +++++++++++++++++++++++++++++++++++++ 1 file changed, 2418 insertions(+) create mode 100644 01_pytorch_workflow.ipynb diff --git a/01_pytorch_workflow.ipynb b/01_pytorch_workflow.ipynb new file mode 100644 index 0000000..3c1c71a --- /dev/null +++ b/01_pytorch_workflow.ipynb @@ -0,0 +1,2418 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OgYkrRCRec0r" + }, + "source": [ + "# 01. PyTorch Workflow Fundamentals\n", + "\n", + "The essence of machine learning and deep learning is to take some data from the past, build an algorithm (like a neural network) to discover patterns in it and use the discoverd patterns to predict the future.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "51Ug7Ug123Ip" + }, + "source": [ + "## What we're going to cover\n", + "\n", + "In this module we're going to cover a standard PyTorch workflow (it can be chopped and changed as necessary but it covers the main outline of steps).\n", + "\n", + "\"a\n", + "\n", + "For now, we'll use this workflow to predict a simple straight line but the workflow steps can be repeated and changed depending on the problem you're working on.\n", + "\n", + "Specifically, we're going to cover:\n", + "\n", + "| **Topic** | **Contents** |\n", + "| ----- | ----- |\n", + "| **1. Getting data ready** | Data can be almost anything but to get started we're going to create a simple straight line |\n", + "| **2. Building a model** | Here we'll create a model to learn patterns in the data, we'll also choose a **loss function**, **optimizer** and build a **training loop**. | \n", + "| **3. Fitting the model to data (training)** | We've got data and a model, now let's let the model (try to) find patterns in the (**training**) data. |\n", + "| **4. Making predictions and evaluating a model (inference)** | Our model's found patterns in the data, let's compare its findings to the actual (**testing**) data. |\n", + "| **5. Saving and loading a model** | You may want to use your model elsewhere, or come back to it later, here we'll cover that. |\n", + "| **6. Putting it all together** | Let's take all of the above and combine it. |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L9EOt5cbod6l" + }, + "source": [ + "And now let's import what we'll need for this module.\n", + "\n", + "We're going to get `torch`, `torch.nn` (`nn` stands for neural network and this package contains the building blocks for creating neural networks in PyTorch) and `matplotlib`." + ] + }, + { + "cell_type": "code", + "source": [ + "import torch\n", + "from torch import nn # nn contains all PyTorch's building blocks for neural networks\n", + "import matplotlib.pyplot as plt\n", + "# Check PyTorch Version\n", + "torch.__version__" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "irwAzjKh40kb", + "outputId": "101d60a8-f1c9-4727-a0f0-795e48c65984" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'1.13.1+cu116'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 1 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ci_-geIdec0w" + }, + "source": [ + "## 1. Data (preparing and loading)\n", + "\n", + "I want to stress that \"data\" in machine learning can be almost anything you can imagine. A table of numbers (like a big Excel spreadsheet), images of any kind, videos (YouTube has lots of data!), audio files like songs or podcasts, protein structures, text and more.\n", + "\n", + "\n", + "Let's create our data as a straight line.\n", + "\n", + "We'll use [linear regression](https://en.wikipedia.org/wiki/Linear_regression) to create the data with known **parameters** (things that can be learned by a model) and then we'll use PyTorch to see if we can build model to estimate these parameters using [**gradient descent**](https://en.wikipedia.org/wiki/Gradient_descent).\n", + "\n", + "Don't worry if the terms above don't mean much now, we'll see them in action and I'll put extra resources below where you can learn more.\n", + "\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Create known parameters\n", + "weight = 0.7\n", + "bias = 0.3\n", + "\n", + "# Create data\n", + "start = 0\n", + "end = 1\n", + "step = 0.02\n", + "X = torch.arange(start,end,step).unsqueeze(dim = 1)\n", + "y = weight * X + bias\n", + "X[:10], y[:10]" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QqRnkCY955Mi", + "outputId": "55871dea-b708-4efe-9b37-ec23c7ed9961" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(tensor([[0.0000],\n", + " [0.0200],\n", + " [0.0400],\n", + " [0.0600],\n", + " [0.0800],\n", + " [0.1000],\n", + " [0.1200],\n", + " [0.1400],\n", + " [0.1600],\n", + " [0.1800]]), tensor([[0.3000],\n", + " [0.3140],\n", + " [0.3280],\n", + " [0.3420],\n", + " [0.3560],\n", + " [0.3700],\n", + " [0.3840],\n", + " [0.3980],\n", + " [0.4120],\n", + " [0.4260]]))" + ] + }, + "metadata": {}, + "execution_count": 2 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dzNigr8dtW2Y" + }, + "source": [ + "Beautiful! Now we're going to move towards building a model that can learn the relationship between `X` (**features**) and `y` (**labels**). " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YApM7diprjP0" + }, + "source": [ + "### Split data into training and test sets \n", + "\n", + "One of most important steps in a machine learning project is creating a training and test set (and when required, a validation set).\n", + "\n", + "Each split of the dataset serves a specific purpose:\n", + "\n", + "| Split | Purpose | Amount of total data | How often is it used? |\n", + "| ----- | ----- | ----- | ----- |\n", + "| **Training set** | The model learns from this data (like the course materials you study during the semester). | ~60-80% | Always |\n", + "| **Validation set** | The model gets tuned on this data (like the practice exam you take before the final exam). | ~10-20% | Often but not always |\n", + "| **Testing set** | The model gets evaluated on this data to test what it has learned (like the final exam you take at the end of the semester). | ~10-20% | Always |\n", + "\n", + "For now, we'll just use a training and test set, this means we'll have a dataset for our model to learn on as well as be evaluated on.\n", + "\n", + "We can create them by splitting our `X` and `y` tensors.\n", + "\n", + "> **Note:** When dealing with real-world data, this step is typically done right at the start of a project (the test set should always be kept separate from all other data). We want our model to learn on training data and then evaluate it on test data to get an indication of how well it **generalizes** to unseen examples.\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Create train/test split\n", + "train_split = int(0.8 * len(X)) # 80% data fro training and 20% for testing\n", + "X_train, y_train = X[:train_split], y[:train_split]\n", + "X_test, y_test = X[train_split:], y[train_split:]\n", + "\n", + "len(X_train), len(y_train), len(X_test), len(y_test)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wUq3AiYo86qq", + "outputId": "a19f7c76-87de-4e78-e9b1-6ea0b3afaaa6" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(40, 40, 10, 10)" + ] + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ua1y5hFjtLxC" + }, + "source": [ + "Wonderful, we've got 40 samples for training (`X_train` & `y_train`) and 10 samples for testing (`X_test` & `y_test`).\n", + "\n", + "The model we create is going to try and learn the relationship between `X_train` & `y_train` and then we will evaluate what it learns on `X_test` and `y_test`.\n", + "\n", + "But right now our data is just numbers on a page.\n", + "\n", + "Let's create a function to visualize it." + ] + }, + { + "cell_type": "code", + "source": [ + "def plot_predictions(train_data = X_train,\n", + " train_labels = y_train,\n", + " test_data = X_test,\n", + " test_labels = y_test,\n", + " predictions = None):\n", + " \"\"\" Function to plot training data, testing data and predications\n", + " \"\"\"\n", + " plt.figure(figsize = (10,7))\n", + "\n", + " # Plot training data in blue dot\n", + " plt.scatter(train_data, train_labels, c= 'b', s = 4, label =\"Training data\")\n", + " \n", + " # plot test data in green dot\n", + " plt.scatter(test_data,test_labels, c= 'g', s = 4, label ='Testing data')\n", + "\n", + " if predictions is not None:\n", + " # plot predications in red colour dot\n", + " plt.scatter(test_data, predictions, c= 'r', s = 4, label ='Predications')\n", + " # show the legend\n", + " plt.legend(prop = {'size':14});" + ], + "metadata": { + "id": "8AnpxtRE-RZB" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 428 + }, + "id": "xTaIwydGec0z", + "outputId": "0f378798-135e-4772-c83a-c677a0b704dc" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "plot_predictions();" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0eFsorRHec00" + }, + "source": [ + "## 2. Build model\n", + "\n", + "Now we've got some data, let's build a model to use the blue dots to predict the green dots.\n", + "\n", + "Let's replicate a standard linear regression model using pure PyTorch." + ] + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "twWMxhZaO2cs" + } + }, + { + "cell_type": "code", + "source": [ + "# Create a Linear Regression Model Class\n", + "class LinearRegressionModel(nn.Module): # <- almost everything in PyTorch is a nn.Module (think of this as neural network lego blocks)\n", + " def __init__(self):\n", + " super().__init__()\n", + " self.weights = nn.Parameter(torch.randn(1,# start with random weights\n", + " dtype = torch.float),\n", + " requires_grad= True) # we can update this value with \n", + " self.bias = nn.Parameter(torch.randn(1, dtype= torch.float), requires_grad = True)\n", + "\n", + " # Forward defines the computation in the model\n", + " def forward(self, x: torch.Tensor) -> torch.Tensor: # <- \"x\" is the input data (e.g. training/testing features)\n", + " return self.weights * x + self.bias # linear regression formula (y = m*x + b)" + ], + "metadata": { + "id": "hZvv1DINFaBQ" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xhu5wxVO7s_q" + }, + "source": [ + "> **Resource:** We'll be using Python classes to create bits and pieces for building neural networks. If you're unfamiliar with Python class notation, I'd recommend reading [Real Python's Object Orientating programming in Python 3 guide](https://realpython.com/python3-object-oriented-programming/) a few times.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iRRq3a0Gvvnl" + }, + "source": [ + "### PyTorch model building essentials\n", + "\n", + "PyTorch has four (give or take) essential modules you can use to create almost any kind of neural network you can imagine.\n", + "\n", + "They are [`torch.nn`](https://pytorch.org/docs/stable/nn.html), [`torch.optim`](https://pytorch.org/docs/stable/optim.html), [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) and [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html). For now, we'll focus on the first two and get to the other two later (though you may be able to guess what they do).\n", + "\n", + "| PyTorch module | What does it do? |\n", + "| ----- | ----- |\n", + "| [`torch.nn`](https://pytorch.org/docs/stable/nn.html) | Contains all of the building blocks for computational graphs (essentially a series of computations executed in a particular way). |\n", + "| [`torch.nn.Parameter`](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html#parameter) | Stores tensors that can be used with `nn.Module`. If `requires_grad=True` gradients (used for updating model parameters via [**gradient descent**](https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html)) are calculated automatically, this is often referred to as \"autograd\". | \n", + "| [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module) | The base class for all neural network modules, all the building blocks for neural networks are subclasses. If you're building a neural network in PyTorch, your models should subclass `nn.Module`. Requires a `forward()` method be implemented. | \n", + "| [`torch.optim`](https://pytorch.org/docs/stable/optim.html) | Contains various optimization algorithms (these tell the model parameters stored in `nn.Parameter` how to best change to improve gradient descent and in turn reduce the loss). | \n", + "| `def forward()` | All `nn.Module` subclasses require a `forward()` method, this defines the computation that will take place on the data passed to the particular `nn.Module` (e.g. the linear regression formula above). |\n", + "\n", + "If the above sounds complex, think of like this, almost everything in a PyTorch neural network comes from `torch.nn`,\n", + "* `nn.Module` contains the larger building blocks (layers)\n", + "* `nn.Parameter` contains the smaller parameters like weights and biases (put these together to make `nn.Module`(s))\n", + "* `forward()` tells the larger blocks how to make calculations on inputs (tensors full of data) within `nn.Module`(s)\n", + "* `torch.optim` contains optimization methods on how to improve the parameters within `nn.Parameter` to better represent input data \n", + "\n", + "![a pytorch linear model with annotations](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-linear-model-annotated.png)\n", + "*Basic building blocks of creating a PyTorch model by subclassing `nn.Module`. For objects that subclass `nn.Module`, the `forward()` method must be defined.*\n", + "\n", + "> **Resource:** See more of these essential modules and their uses cases in the [PyTorch Cheat Sheet](https://pytorch.org/tutorials/beginner/ptcheat.html). \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HYt5sKsgufG7" + }, + "source": [ + "\n", + "### Checking the contents of a PyTorch model\n", + "Now we've got these out of the way, let's create a model instance with the class we've made and check its parameters using [`.parameters()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.parameters). " + ] + }, + { + "cell_type": "code", + "source": [ + "# Set manual seed since nn.parameter are randomly initialzied\n", + "torch.manual_seed(42)\n", + "\n", + "# Create an instance of the model\n", + "model_0 = LinearRegressionModel()\n", + "\n", + "# Check the nn.Parameters within the nn.module subclass we created\n", + "list(model_0.parameters())" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DcRDs0QhUhRj", + "outputId": "058729be-9cca-44b6-fb6c-fae987babb34" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[Parameter containing:\n", + " tensor([0.3367], requires_grad=True), Parameter containing:\n", + " tensor([0.1288], requires_grad=True)]" + ] + }, + "metadata": {}, + "execution_count": 7 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CNOmcQdSq34e" + }, + "source": [ + "We can also get the state (what the model contains) of the model using [`.state_dict()`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict)." + ] + }, + { + "cell_type": "code", + "source": [ + "# list named parameters\n", + "model_0.state_dict()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "u68u2yXAVjww", + "outputId": "21a344c9-36dd-4efc-b0ef-b55d3752c391" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "OrderedDict([('weights', tensor([0.3367])), ('bias', tensor([0.1288]))])" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tdTEPSwSec02" + }, + "source": [ + "Notice how the values for `weights` and `bias` from `model_0.state_dict()` come out as random float tensors?\n", + "\n", + "This is becuase we initialized them above using `torch.randn()`.\n", + "\n", + "Essentially we want to start from random parameters and get the model to update them towards parameters that fit our data best (the hardcoded `weight` and `bias` values we set when creating our straight line data).\n", + "\n", + "> **Exercise:** Try changing the `torch.manual_seed()` value two cells above, see what happens to the weights and bias values. \n", + "\n", + "Because our model starts with random values, right now it'll have poor predictive power.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BDKdLN7nuheb" + }, + "source": [ + "### Making predictions using `torch.inference_mode()` \n", + "To check this we can pass it the test data `X_test` to see how closely it predicts `y_test`.\n", + "\n", + "When we pass data to our model, it'll go through the model's `forward()` method and produce a result using the computation we've defined. \n", + "\n", + "Let's make some predictions. " + ] + }, + { + "cell_type": "code", + "source": [ + "# Make predications with model\n", + "with torch.inference_mode():\n", + " y_preds = model_0(X_test)\n", + "y_preds" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Hyt6oiYfV7V7", + "outputId": "44d90ef6-5473-4836-b3be-f16aee7e70d7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[0.3982],\n", + " [0.4049],\n", + " [0.4116],\n", + " [0.4184],\n", + " [0.4251],\n", + " [0.4318],\n", + " [0.4386],\n", + " [0.4453],\n", + " [0.4520],\n", + " [0.4588]])" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Note: in older PyTorch code you might also see torch.no_grad() ---> Lets also try it\n", + "with torch.no_grad():\n", + " y_preds = model_0(X_test)\n", + "y_preds" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZXgCJU3ZWdMT", + "outputId": "2d64bd29-a9e7-4ee4-95d3-6c571757b636" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[0.3982],\n", + " [0.4049],\n", + " [0.4116],\n", + " [0.4184],\n", + " [0.4251],\n", + " [0.4318],\n", + " [0.4386],\n", + " [0.4453],\n", + " [0.4520],\n", + " [0.4588]])" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L_Bx5I1FsIS0" + }, + "source": [ + "Hmm?\n", + "\n", + "You probably noticed we used [`torch.inference_mode()`](https://pytorch.org/docs/stable/generated/torch.inference_mode.html) as a [context manager](https://realpython.com/python-with-statement/) (that's what the `with torch.inference_mode():` is) to make the predictions.\n", + "\n", + "As the name suggests, `torch.inference_mode()` is used when using a model for inference (making predictions).\n", + "\n", + "`torch.inference_mode()` turns off a bunch of things (like gradient tracking, which is necessary for training but not for inference) to make **forward-passes** (data going through the `forward()` method) faster.\n", + "\n", + "> **Note:** In older PyTorch code, you may also see `torch.no_grad()` being used for inference. While `torch.inference_mode()` and `torch.no_grad()` do similar things,\n", + "`torch.inference_mode()` is newer, potentially faster and preferred. See this [Tweet from PyTorch](https://twitter.com/PyTorch/status/1437838231505096708?s=20) for more.\n", + "\n", + "We've made some predictions, let's see what they look like. " + ] + }, + { + "cell_type": "code", + "source": [ + "# Check the predictions\n", + "print(f\"Number of testing samples: {len(X_test)}\")\n", + "print(f\"Number of predictions: {len(y_preds)}\")\n", + "print(f'Predicted values:\\n{y_preds}')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vNtUa8RYXk76", + "outputId": "10a8ac53-2457-4651-849e-ffda3d54a50d" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Number of testing samples: 10\n", + "Number of predictions: 10\n", + "Predicted values:\n", + "tensor([[0.3982],\n", + " [0.4049],\n", + " [0.4116],\n", + " [0.4184],\n", + " [0.4251],\n", + " [0.4318],\n", + " [0.4386],\n", + " [0.4453],\n", + " [0.4520],\n", + " [0.4588]])\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FnSwGbQEupZs" + }, + "source": [ + "Notice how there's one prediction value per testing sample.\n", + "\n", + "This is because of the kind of data we're using. For our straight line, one `X` value maps to one `y` value. \n", + "\n", + "However, machine learning models are very flexible. You could have 100 `X` values mapping to one, two, three or 10 `y` values. It all depends on what you're working on.\n", + "\n", + "Our predictions are still numbers on a page, let's visualize them with our `plot_predictions()` function we created above." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 428 + }, + "id": "pwjxLWZTec02", + "outputId": "cfe6fff8-38b7-46e2-c70b-c0708c52f0f6" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "plot_predictions(predictions=y_preds)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JLJWVANkhY3-", + "outputId": "1a5a7b40-bb72-4339-e65f-4a2a0705d7a5" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[0.4618],\n", + " [0.4691],\n", + " [0.4764],\n", + " [0.4836],\n", + " [0.4909],\n", + " [0.4982],\n", + " [0.5054],\n", + " [0.5127],\n", + " [0.5200],\n", + " [0.5272]])" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ], + "source": [ + "y_test - y_preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lxt8WUzdv1qS" + }, + "source": [ + "Woah! Those predictions look pretty bad...\n", + "\n", + "This make sense though when you remember our model is just using random parameter values to make predictions.\n", + "\n", + "It hasn't even looked at the blue dots to try to predict the green dots.\n", + "\n", + "Time to change that." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZZpa-fXLec03" + }, + "source": [ + "## 3. Train model\n", + "\n", + "Right now our model is making predictions using random parameters to make calculations, it's basically guessing (randomly).\n", + "\n", + "To fix that, we can update its internal parameters (I also refer to *parameters* as patterns), the `weights` and `bias` values we set randomly using `nn.Parameter()` and `torch.randn()` to be something that better represents the data.\n", + "\n", + "Instead, it's much more fun to write code to see if the model can try and figure them out itself.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aD8pnhJUyZUT" + }, + "source": [ + "### Creating a loss function and optimizer in PyTorch\n", + "\n", + "For our model to update its parameters on its own, we'll need to add a few more things to our recipe.\n", + "\n", + "And that's a **loss function** as well as an **optimizer**.\n", + "\n", + "The rolls of these are: \n", + "\n", + "| Function | What does it do? | Where does it live in PyTorch? | Common values |\n", + "| ----- | ----- | ----- | ----- |\n", + "| **Loss function** | Measures how wrong your models predictions (e.g. `y_preds`) are compared to the truth labels (e.g. `y_test`). Lower the better. | PyTorch has plenty of built-in loss functions in [`torch.nn`](https://pytorch.org/docs/stable/nn.html#loss-functions). | Mean absolute error (MAE) for regression problems ([`torch.nn.L1Loss()`](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html)). Binary cross entropy for binary classification problems ([`torch.nn.BCELoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html)). |\n", + "| **Optimizer** | Tells your model how to update its internal parameters to best lower the loss. | You can find various optimization function implementations in [`torch.optim`](https://pytorch.org/docs/stable/optim.html). | Stochastic gradient descent ([`torch.optim.SGD()`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD)). Adam optimizer ([`torch.optim.Adam()`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam)). | \n", + "\n", + "Let's create a loss function and an optimizer we can use to help improve our model.\n", + "\n", + "Depending on what kind of problem you're working on will depend on what loss function and what optimizer you use.\n", + "\n", + "However, there are some common values, that are known to work well such as the SGD (stochastic gradient descent) or Adam optimizer. And the MAE (mean absolute error) loss function for regression problems (predicting a number) or binary cross entropy loss function for classification problems (predicting one thing or another). \n", + "\n", + "For our problem, since we're predicting a number, let's use MAE (which is under `torch.nn.L1Loss()`) in PyTorch as our loss function. \n", + "\n", + "![what MAE loss looks like for our plot data](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-mae-loss-annotated.png)\n", + "*Mean absolute error (MAE, in PyTorch: `torch.nn.L1Loss`) measures the absolute difference between two points (predictions and labels) and then takes the mean across all examples.*\n", + "\n", + "And we'll use SGD, `torch.optim.SGD(params, lr)` where:\n", + "\n", + "* `params` is the target model parameters you'd like to optimize (e.g. the `weights` and `bias` values we randomly set before).\n", + "* `lr` is the **learning rate** you'd like the optimizer to update the parameters at, higher means the optimizer will try larger updates (these can sometimes be too large and the optimizer will fail to work), lower means the optimizer will try smaller updates (these can sometimes be too small and the optimizer will take too long to find the ideal values). The learning rate is considered a **hyperparameter** (because it's set by a machine learning engineer). Common starting values for the learning rate are `0.01`, `0.001`, `0.0001`, however, these can also be adjusted over time (this is called [learning rate scheduling](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate)). \n", + "\n", + "Woah, that's a lot, let's see it in code." + ] + }, + { + "cell_type": "code", + "source": [ + "# Create the loss function\n", + "loss_fn = nn.L1Loss() # MAE loss is same as L1Loss\n", + "\n", + "# Create the optimizer\n", + "optimizer = torch.optim.SGD(params = model_0.parameters(), # parameters of target model to optimize\n", + " lr = 0.01) # learning rate" + ], + "metadata": { + "id": "EB25cqATeFq_" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aFcKCsPcRfnA" + }, + "source": [ + "### Creating an optimization loop in PyTorch\n", + "\n", + "Woohoo! Now we've got a loss function and an optimizer, it's now time to create a **training loop** (and **testing loop**).\n", + "\n", + "The training loop involves the model going through the training data and learning the relationships between the `features` and `labels`.\n", + "\n", + "The testing loop involves going through the testing data and evaluating how good the patterns are that the model learned on the training data (the model never see's the testing data during training).\n", + "\n", + "Each of these is called a \"loop\" because we want our model to look (loop through) at each sample in each dataset.\n", + "\n", + "To create these we're going to write a Python `for` loop in the theme of the [unofficial PyTorch optimization loop song](https://twitter.com/mrdbourke/status/1450977868406673410?s=20) (there's a [video version too](https://youtu.be/Nutpusq_AFw)).\n", + "\n", + "![the unofficial pytorch optimization loop song](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-optimization-loop-song.png)\n", + "*The unoffical PyTorch optimization loops song, a fun way to remember the steps in a PyTorch training (and testing) loop.*\n", + "\n", + "There will be a fair bit of code but nothing we can't handle.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "agXn72H-sgyd" + }, + "source": [ + "\n", + "\n", + "### PyTorch training loop\n", + "For the training loop, we'll build the following steps:\n", + "\n", + "| Number | Step name | What does it do? | Code example |\n", + "| ----- | ----- | ----- | ----- |\n", + "| 1 | Forward pass | The model goes through all of the training data once, performing its `forward()` function calculations. | `model(x_train)` |\n", + "| 2 | Calculate the loss | The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are. | `loss = loss_fn(y_pred, y_train)` | \n", + "| 3 | Zero gradients | The optimizers gradients are set to zero (they are accumulated by default) so they can be recalculated for the specific training step. | `optimizer.zero_grad()` |\n", + "| 4 | Perform backpropagation on the loss | Computes the gradient of the loss with respect for every model parameter to be updated (each parameter with `requires_grad=True`). This is known as **backpropagation**, hence \"backwards\". | `loss.backward()` |\n", + "| 5 | Update the optimizer (**gradient descent**) | Update the parameters with `requires_grad=True` with respect to the loss gradients in order to improve them. | `optimizer.step()` |\n", + "\n", + "![pytorch training loop annotated](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-training-loop-annotated.png)\n", + "\n", + "> **Note:** The above is just one example of how the steps could be ordered or described. With experience you'll find making PyTorch training loops can be quite flexible.\n", + ">\n", + "> And on the ordering of things, the above is a good default order but you may see slightly different orders. Some rules of thumb: \n", + "> * Calculate the loss (`loss = ...`) *before* performing backpropagation on it (`loss.backward()`).\n", + "> * Zero gradients (`optimizer.zero_grad()`) *before* stepping them (`optimizer.step()`).\n", + "> * Step the optimizer (`optimizer.step()`) *after* performing backpropagation on the loss (`loss.backward()`).\n", + "\n", + "For resources to help understand what's happening behind the scenes with backpropagation and gradient descent, see the extra-curriculum section.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OXHDdlfjssDc" + }, + "source": [ + "\n", + "### PyTorch testing loop\n", + "\n", + "As for the testing loop (evaluating our model), the typical steps include:\n", + "\n", + "| Number | Step name | What does it do? | Code example |\n", + "| ----- | ----- | ----- | ----- |\n", + "| 1 | Forward pass | The model goes through all of the training data once, performing its `forward()` function calculations. | `model(x_test)` |\n", + "| 2 | Calculate the loss | The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are. | `loss = loss_fn(y_pred, y_test)` | \n", + "| 3 | Calulate evaluation metrics (optional) | Alongisde the loss value you may want to calculate other evaluation metrics such as accuracy on the test set. | Custom functions |\n", + "\n", + "Notice the testing loop doesn't contain performing backpropagation (`loss.backward()`) or stepping the optimizer (`optimizer.step()`), this is because no parameters in the model are being changed during testing, they've already been calculated. For testing, we're only interested in the output of the forward pass through the model.\n", + "\n", + "![pytorch annotated testing loop](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-testing-loop-annotated.png)\n", + "\n", + "Let's put all of the above together and train our model for 100 **epochs** (forward passes through the data) and we'll evaluate it every 10 epochs.\n" + ] + }, + { + "cell_type": "code", + "source": [ + "torch.manual_seed(42)\n", + "# Set the number of epochs (how many times the model will pass over the training data)\n", + "epochs = 100\n", + "\n", + "# Create empty loss list to track values\n", + "train_loss_values = []\n", + "test_loss_values = []\n", + "epoch_count = []\n", + "\n", + "for epoch in range(epochs):\n", + " ### Training\n", + "\n", + " # Putting model in training model (this is the default state of a model)\n", + " model_0.train()\n", + " # 1. Forward pass on train data using the forward() method inside\n", + " y_pred = model_0(X_train)\n", + " # 2. Calculate the loss\n", + " loss = loss_fn(y_pred, y_train)\n", + " # 3. Zero grad of the optimizer\n", + " optimizer.zero_grad()\n", + " # 4. Loss backwards\n", + " loss.backward()\n", + " # 5. Progress Optimizer\n", + " optimizer.step()\n", + "\n", + " ### Testing Loop\n", + "\n", + " # Pul model under evalution mode\n", + " model_0.eval()\n", + "\n", + " with torch.inference_mode():\n", + " # 1. Forward Pass on test data\n", + " test_pred = model_0(X_test)\n", + " # 2. Calculate loss on test data\n", + " test_loss = loss_fn(test_pred,y_test.type(torch.float)) # Predications comes in torch.float\n", + " # Printing out details of what happening\n", + " if epoch % 10 == 0:\n", + " epoch_count.append(epoch)\n", + " train_loss_values.append(loss.detach().numpy())\n", + " test_loss_values.append(test_loss.detach().numpy())\n", + " print(f'Epoch: {epoch} | MAE Train Loss :{loss} | MAE Test Loss: {test_loss}')" + ], + "metadata": { + "id": "BOdTzvIDfkt9", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "dcb22f45-5822-4160-c989-f3b6444ac0ad" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch: 0 | MAE Train Loss :0.31288138031959534 | MAE Test Loss: 0.48106518387794495\n", + "Epoch: 10 | MAE Train Loss :0.1976713240146637 | MAE Test Loss: 0.3463551998138428\n", + "Epoch: 20 | MAE Train Loss :0.08908725529909134 | MAE Test Loss: 0.21729660034179688\n", + "Epoch: 30 | MAE Train Loss :0.053148526698350906 | MAE Test Loss: 0.14464017748832703\n", + "Epoch: 40 | MAE Train Loss :0.04543796554207802 | MAE Test Loss: 0.11360953003168106\n", + "Epoch: 50 | MAE Train Loss :0.04167863354086876 | MAE Test Loss: 0.09919948130846024\n", + "Epoch: 60 | MAE Train Loss :0.03818932920694351 | MAE Test Loss: 0.08886633068323135\n", + "Epoch: 70 | MAE Train Loss :0.03476089984178543 | MAE Test Loss: 0.0805937647819519\n", + "Epoch: 80 | MAE Train Loss :0.03132382780313492 | MAE Test Loss: 0.07232122868299484\n", + "Epoch: 90 | MAE Train Loss :0.02788739837706089 | MAE Test Loss: 0.06473556160926819\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1krgBqXBdYHc" + }, + "source": [ + "Oh would you look at that! Looks like our loss is going down with every epoch, let's plot it to find out." + ] + }, + { + "cell_type": "code", + "source": [ + "# Plot the loss curves\n", + "plt.plot(epoch_count, train_loss_values, label = 'Train loss')\n", + "plt.plot(epoch_count, test_loss_values, label = 'Test loss')\n", + "plt.title('Training and Test loss Curves')\n", + "plt.ylabel('Loss')\n", + "plt.xlabel('Epochs')\n", + "plt.legend();" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 295 + }, + "id": "frw6nfFxqsXm", + "outputId": "fa9084eb-2df5-4d56-e094-0579bfcbf899" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lmqQE8Kpec04" + }, + "source": [ + "Nice! The **loss curves** show the loss going down over time. Remember, loss is the measure of how *wrong* your model is, so the lower the better.\n", + "\n", + "But why did the loss go down?\n", + "\n", + "Well, thanks to our loss function and optimizer, the model's internal parameters (`weights` and `bias`) were updated to better reflect the underlying patterns in the data.\n", + "\n", + "Let's inspect our model's [`.state_dict()`](https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html) to see see how close our model gets to the original values we set for weights and bias.\n", + "\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Find our model learned parameters\n", + "print('The model learned the following values for weights and bias:')\n", + "print(model_0.state_dict())\n", + "print(\"\\n And the original values for weights and bias are:\")\n", + "print(f\"weights: {weight}, bias: {bias}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1HTl2FpJ9Pn9", + "outputId": "f59c7501-4f4b-45a1-de75-3cc0de68d82b" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "The model learned the following values for weights and bias:\n", + "OrderedDict([('weights', tensor([0.5784])), ('bias', tensor([0.3513]))])\n", + "\n", + " And the original values for weights and bias are:\n", + "weights: 0.7, bias: 0.3\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BZyBa9rMelBv" + }, + "source": [ + "Wow! How cool is that?\n", + "\n", + "Our model got very close to calculate the exact original values for `weight` and `bias` (and it would probably get even closer if we trained it for longer).\n", + "\n", + "> **Exercise:** Try changing the `epochs` value above to 200, what happens to the loss curves and the weights and bias parameter values of the model?\n", + "\n", + "It'd likely never guess them *perfectly* (especially when using more complicated datasets) but that's okay, often you can do very cool things with a close approximation.\n", + "\n", + "This is the whole idea of machine learning and deep learning, **there are some ideal values that describe our data** and rather than figuring them out by hand, **we can train a model to figure them out programmatically**." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c-VBDFd2ec05" + }, + "source": [ + "## 4. Making predictions with a trained PyTorch model (inference)\n", + "\n", + "Once you've trained a model, you'll likely want to make predictions with it.\n", + "\n", + "We've already seen a glimpse of this in the training and testing code above, the steps to do it outside of the training/testing loop are similar.\n", + "\n", + "There are three things to remember when making predictions (also called performing inference) with a PyTorch model:\n", + "\n", + "1. Set the model in evaluation mode (`model.eval()`).\n", + "2. Make the predictions using the inference mode context manager (`with torch.inference_mode(): ...`).\n", + "3. All predictions should be made with objects on the same device (e.g. data and model on GPU only or data and model on CPU only).\n", + "\n", + "The first two items make sure all helpful calculations and settings PyTorch uses behind the scenes during training but aren't necessary for inference are turned off (this results in faster computation). And the third ensures that you won't run into cross-device errors." + ] + }, + { + "cell_type": "code", + "source": [ + "# 1. Set the model in evaluation mode\n", + "model_0.eval()\n", + "\n", + "# 2. Setup the inference mode context manager\n", + "with torch.inference_mode():\n", + " # 3. Make sure the calculations are done with the model and data on the same device\n", + " # in our case, we haven't setup device-agnostic code yet so our data and model are\n", + " # on the CPU by default.\n", + " # model_0.to(device)\n", + " # X_test = X_test.to(device)\n", + " y_preds = model_0(X_test)\n", + "y_preds" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "zTIpQLrh_5Vy", + "outputId": "7c89a29b-399a-46a9-d358-413d700688ab" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[0.8141],\n", + " [0.8256],\n", + " [0.8372],\n", + " [0.8488],\n", + " [0.8603],\n", + " [0.8719],\n", + " [0.8835],\n", + " [0.8950],\n", + " [0.9066],\n", + " [0.9182]])" + ] + }, + "metadata": {}, + "execution_count": 18 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cn21JvzmjbBO" + }, + "source": [ + "Nice! We've made some predictions with our trained model, now how do they look?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 428 + }, + "id": "b_kBqpCfec05", + "outputId": "d8817fa6-0d27-402c-b86d-d9ee5e957785" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "plot_predictions(predictions=y_preds)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fEHGrjLgji6E" + }, + "source": [ + "Woohoo! Those red dots are looking far closer than they were before!\n", + "\n", + "Let's get onto saving an reloading a model in PyTorch." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8NRng9aEec05" + }, + "source": [ + "## 5. Saving and loading a PyTorch model\n", + "\n", + "If you've trained a PyTorch model, chances are you'll want to save it and export it somewhere.\n", + "\n", + "As in, you might train it on Google Colab or your local machine with a GPU but you'd like to now export it to some sort of application where others can use it. \n", + "\n", + "Or maybe you'd like to save your progress on a model and come back and load it back later.\n", + "\n", + "For saving and loading models in PyTorch, there are three main methods you should be aware of (all of below have been taken from the [PyTorch saving and loading models guide](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)):\n", + "\n", + "| PyTorch method | What does it do? | \n", + "| ----- | ----- |\n", + "| [`torch.save`](https://pytorch.org/docs/stable/torch.html?highlight=save#torch.save) | Saves a serialzed object to disk using Python's [`pickle`](https://docs.python.org/3/library/pickle.html) utility. Models, tensors and various other Python objects like dictionaries can be saved using `torch.save`. | \n", + "| [`torch.load`](https://pytorch.org/docs/stable/torch.html?highlight=torch%20load#torch.load) | Uses `pickle`'s unpickling features to deserialize and load pickled Python object files (like models, tensors or dictionaries) into memory. You can also set which device to load the object to (CPU, GPU etc). |\n", + "| [`torch.nn.Module.load_state_dict`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=load_state_dict#torch.nn.Module.load_state_dict)| Loads a model's parameter dictionary (`model.state_dict()`) using a saved `state_dict()` object. | \n", + "\n", + "> **Note:** As stated in [Python's `pickle` documentation](https://docs.python.org/3/library/pickle.html), the `pickle` module **is not secure**. That means you should only ever unpickle (load) data you trust. That goes for loading PyTorch models as well. Only ever use saved PyTorch models from sources you trust.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SdAGcH2aec05" + }, + "source": [ + "### Saving a PyTorch model's `state_dict()`\n", + "\n", + "The [recommended way](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference) for saving and loading a model for inference (making predictions) is by saving and loading a model's `state_dict()`.\n", + "\n", + "Let's see how we can do that in a few steps:\n", + "\n", + "1. We'll create a directory for saving models to called `models` using Python's `pathlib` module.\n", + "2. We'll create a file path to save the model to.\n", + "3. We'll call `torch.save(obj, f)` where `obj` is the target model's `state_dict()` and `f` is the filename of where to save the model.\n", + "\n", + "> **Note:** It's common convention for PyTorch saved models or objects to end with `.pt` or `.pth`, like `saved_model_01.pth`.\n" + ] + }, + { + "cell_type": "code", + "source": [ + "pip install pathlib" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "r_vfgjVqCMVY", + "outputId": "7de4b315-cb90-4e78-85e0-ca5358cf8671" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Requirement already satisfied: pathlib in /usr/local/lib/python3.9/dist-packages (1.0.1)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "from pathlib import Path\n", + "\n", + "# 1. Create models directory\n", + "MODEL_PATH = Path('models')\n", + "MODEL_PATH.mkdir(parents = True, exist_ok = True)\n", + "\n", + "# 2. Create model save path\n", + "MODEL_NAME = '01_pytorch_workflow_model_0.pth'\n", + "MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME\n", + "\n", + "# 3. Save the model state dict\n", + "print(f\"Saving model to: {MODEL_SAVE_PATH}\")\n", + "torch.save(obj = model_0.state_dict(), # only saving the state_dict() only saves the models learned parameters\n", + " f=MODEL_SAVE_PATH)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "uVA1USqp4Tvm", + "outputId": "fc4fd94b-3a83-43c4-93ad-63814fd86831" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Saving model to: models/01_pytorch_workflow_model_0.pth\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Check the saved file path\n", + "!ls -l models/01_pytorch_workflow_model_0.pth" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "d9i9Q6rlCt94", + "outputId": "bc3e9365-8b1f-4e43-e79d-09aaea7429a7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "-rw-r--r-- 1 root root 1207 Mar 25 10:35 models/01_pytorch_workflow_model_0.pth\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jFQpRoH5ec06" + }, + "source": [ + "### Loading a saved PyTorch model's `state_dict()`\n", + "\n", + "Since we've now got a saved model `state_dict()` at `models/01_pytorch_workflow_model_0.pth` we can now load it in using `torch.nn.Module.load_state_dict(torch.load(f))` where `f` is the filepath of our saved model `state_dict()`.\n", + "\n", + "Why call `torch.load()` inside `torch.nn.Module.load_state_dict()`? \n", + "\n", + "Because we only saved the model's `state_dict()` which is a dictionary of learned parameters and not the *entire* model, we first have to load the `state_dict()` with `torch.load()` and then pass that `state_dict()` to a new instance of our model (which is a subclass of `nn.Module`).\n", + "\n", + "Why not save the entire model?\n", + "\n", + "[Saving the entire model](https://pytorch.org/tutorials/beginner/saving_loading_models.html#save-load-entire-model) rather than just the `state_dict()` is more intuitive, however, to quote the PyTorch documentation (italics mine):\n", + "\n", + "> The disadvantage of this approach *(saving the whole model)* is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved...\n", + ">\n", + "> Because of this, your code can break in various ways when used in other projects or after refactors.\n", + "\n", + "So instead, we're using the flexible method of saving and loading just the `state_dict()`, which again is basically a dictionary of model parameters.\n", + "\n", + "Let's test it out by created another instance of `LinearRegressionModel()`, which is a subclass of `torch.nn.Module` and will hence have the in-built method `load_state_dit()`." + ] + }, + { + "cell_type": "code", + "source": [ + "# Instantiate a new instance of our model (this will be instantiated with random weights)\n", + "loaded_model_0 = LinearRegressionModel()\n", + "\n", + "# Load the state_dict of our saved model (this will update the new instance of our model with trained weights)\n", + "loaded_model_0.load_state_dict(torch.load(f=MODEL_SAVE_PATH))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "jZ79T9KWFy9C", + "outputId": "7a23a13d-3f2a-49e9-c817-c759d867b4e0" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 22 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vK8PRtY7Qgpz" + }, + "source": [ + "Excellent! It looks like things matched up.\n", + "\n", + "Now to test our loaded model, let's perform inference with it (make predictions) on the test data.\n", + "\n", + "Remember the rules for performing inference with PyTorch models?\n", + "\n", + "If not, here's a refresher:\n", + "\n", + "
\n", + " PyTorch inference rules\n", + "
    \n", + "
  1. Set the model in evaluation mode (model.eval()).
  2. \n", + "
  3. Make the predictions using the inference mode context manager (with torch.inference_mode(): ...).
  4. \n", + "
  5. All predictions should be made with objects on the same device (e.g. data and model on GPU only or data and model on CPU only).
  6. \n", + "
\n", + "
\n", + "\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# 1. Put the loaded model into evalution mode\n", + "loaded_model_0.eval()\n", + "\n", + "# 2. Use the inference mode context manger to make predictions\n", + "with torch.inference_mode():\n", + " loaded_model_preds = loaded_model_0(X_test) # perform a forward pass on the test data with the loaded model\n" + ], + "metadata": { + "id": "Stw1IdaCGQLw" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e81XpN8WSSqn" + }, + "source": [ + "Now we've made some predictions with the loaded model, let's see if they're the same as the previous predictions." + ] + }, + { + "cell_type": "code", + "source": [ + "# Compare previous model predications with loaded model predictions (these should be the same)\n", + "y_preds == loaded_model_preds" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bEXGm8IBG3Es", + "outputId": "7d3aff13-d4df-4afe-c6cd-d4cdb03bc9bb" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True]])" + ] + }, + "metadata": {}, + "execution_count": 24 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9Y4ZcxxfNcVu" + }, + "source": [ + "Nice! \n", + "\n", + "It looks like the loaded model predictions are the same as the previous model predictions (predictions made prior to saving). This indicates our model is saving and loading as expected.\n", + "\n", + "> **Note:** There are more methods to save and load PyTorch models but I'll leave these for extra-curriculum and further reading. See the [PyTorch guide for saving and loading models](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-and-loading-models) for more. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FeAITvLXec06" + }, + "source": [ + "## 6. Putting it all together \n", + "\n", + "We've covered a fair bit of ground so far. \n", + "\n", + "But once you've had some practice, you'll be performing the above steps like dancing down the street.\n", + "\n", + "Speaking of practice, let's put everything we've done so far together. \n", + "\n", + "Except this time we'll make our code device agnostic (so if there's a GPU available, it'll use it and if not, it will default to the CPU). \n", + "\n", + "There'll be far less commentary in this section than above since what we're going to go through has already been covered.\n", + "\n", + "We'll start by importing the standard libraries we need.\n", + "\n", + "> **Note:** If you're using Google Colab, to setup a GPU, go to Runtime -> Change runtime type -> Hardware acceleration -> GPU. If you do this, it will reset the Colab runtime and you will lose saved variables." + ] + }, + { + "cell_type": "code", + "source": [ + "# Import Pytorch and matplotlib\n", + "import torch\n", + "from torch import nn \n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Check PyTorch Version\n", + "torch.__version__" + ], + "metadata": { + "id": "Val2gvSIHPgo", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "c46438a0-60f1-49fb-cc70-5a0ec1736f62" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'1.13.1+cu116'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 25 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bT-krbNMIw0d" + }, + "source": [ + "Now let's start making our code device agnostic by setting `device=\"cuda\"` if it's available, otherwise it'll default to `device=\"cpu\"`.\n", + "\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Setup device agnostic code\n", + "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", + "print(f\"Using device: {device}\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "m9C5SRsWV_zV", + "outputId": "923bfedd-63a4-47d5-8488-1bca545605c0" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Using device: cuda\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G1t0Ek0GJq6T" + }, + "source": [ + "If you've got access to a GPU, the above should've printed out:\n", + "\n", + "```\n", + "Using device: cuda\n", + "```\n", + "Otherwise, you'll be using a CPU for the following computations. This is fine for our small dataset but it will take longer for larger datasets." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DmilLp3Vec07" + }, + "source": [ + "### 6.1 Data\n", + "\n", + "Let's create some data just like before.\n", + "\n", + "First, we'll hard-code some `weight` and `bias` values.\n", + "\n", + "Then we'll make a range of numbers between 0 and 1, these will be our `X` values.\n", + "\n", + "Finally, we'll use the `X` values, as well as the `weight` and `bias` values to create `y` using the linear regression formula (`y = weight * X + bias`)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fJqgDWUfec07", + "outputId": "1bf1a19d-971e-4ede-c1f1-3b61dda2bfc7" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(tensor([[0.0000],\n", + " [0.0200],\n", + " [0.0400],\n", + " [0.0600],\n", + " [0.0800],\n", + " [0.1000],\n", + " [0.1200],\n", + " [0.1400],\n", + " [0.1600],\n", + " [0.1800]]), tensor([[0.3000],\n", + " [0.3140],\n", + " [0.3280],\n", + " [0.3420],\n", + " [0.3560],\n", + " [0.3700],\n", + " [0.3840],\n", + " [0.3980],\n", + " [0.4120],\n", + " [0.4260]]))" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ], + "source": [ + "# Create weight and bias\n", + "weight = 0.7\n", + "bias = 0.3\n", + "\n", + "# Create range values\n", + "start = 0\n", + "end = 1\n", + "step = 0.02\n", + "\n", + "# Create X and y (features and labels)\n", + "X = torch.arange(start, end, step).unsqueeze(dim=1) # without unsqueeze, errors will happen later on (shapes within linear layers)\n", + "y = weight * X + bias \n", + "X[:10], y[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Oaar6rDGLGaQ" + }, + "source": [ + "Wonderful!\n", + "\n", + "Now we've got some data, let's split it into training and test sets.\n", + "\n", + "We'll use an 80/20 split with 80% training data and 20% testing data." + ] + }, + { + "cell_type": "code", + "source": [ + "# Split data \n", + "train_split = int(0.8* len(X))\n", + "X_train, y_train = X[:train_split], y[:train_split]\n", + "X_test, y_test = X[train_split:], y[train_split:]\n", + "\n", + "len(X_train), len(y_train), len(X_test), len(y_test)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vJairLsEWesu", + "outputId": "9f1b0f56-a830-4196-8c9f-64a1df39ec3b" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(40, 40, 10, 10)" + ] + }, + "metadata": {}, + "execution_count": 28 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "INW8-McyLeFE" + }, + "source": [ + "Excellent, let's visualize them to make sure they look okay." + ] + }, + { + "cell_type": "code", + "source": [ + "plot_predictions(X_train, y_train,X_test,y_test)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 428 + }, + "id": "noueFAQ8bCFg", + "outputId": "84ff2926-d0d0-43d7-8f32-25a871932901" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X0ycBrxIec07" + }, + "source": [ + "### 6.2 Building a PyTorch linear model\n", + "\n", + "We've got some data, now it's time to make a model.\n", + "\n", + "We'll create the same style of model as before except this time, instead of defining the weight and bias parameters of our model manually using `nn.Parameter()`, we'll use [`nn.Linear(in_features, out_features)`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) to do it for us.\n", + "\n", + "Where `in_features` is the number of dimensions your input data has and `out_features` is the number of dimensions you'd like it to be output to.\n", + "\n", + "In our case, both of these are `1` since our data has `1` input feature (`X`) per label (`y`).\n", + "\n", + "![comparison of nn.Parameter Linear Regression model and nn.Linear Linear Regression model](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-linear-regression-model-with-nn-Parameter-and-nn-Linear-compared.png)\n", + "*Creating a linear regression model using `nn.Parameter` versus using `nn.Linear`. There are plenty more examples of where the `torch.nn` module has pre-built computations, including many popular and useful neural network layers.*\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Subclass nn.Module to make our model\n", + "class LinearRegressionModelV2(nn.Module):\n", + " def __init__(self):\n", + " super().__init__()\n", + " # Use nn.Linear () for creating the model parameters\n", + " self.linear_layer = nn.Linear(in_features = 1,\n", + " out_features = 1)\n", + " \n", + " # Define the forward computation \n", + " def forward(self, x: torch.Tensor) -> torch.Tensor:\n", + " return self.linear_layer(x)\n", + "torch.manual_seed(42)\n", + "model_1 = LinearRegressionModelV2()\n", + "model_1, model_1.state_dict()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "czBIFC_vbodn", + "outputId": "13cd0528-5fcb-455f-e26d-920a70f6a585" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(LinearRegressionModelV2(\n", + " (linear_layer): Linear(in_features=1, out_features=1, bias=True)\n", + " ),\n", + " OrderedDict([('linear_layer.weight', tensor([[0.7645]])),\n", + " ('linear_layer.bias', tensor([0.8300]))]))" + ] + }, + "metadata": {}, + "execution_count": 52 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4vLN2pPXNXUs" + }, + "source": [ + "Notice the outputs of `model_1.state_dict()`, the `nn.Linear()` layer created a random `weight` and `bias` parameter for us.\n", + "\n", + "Now let's put our model on the GPU (if it's available).\n", + "\n", + "We can change the device our PyTorch objects are on using `.to(device)`.\n", + "\n", + "First let's check the model's current device." + ] + }, + { + "cell_type": "code", + "source": [ + "# Check model device\n", + "next(model_1.parameters()).device" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Tldo2A6idi9y", + "outputId": "830bb2c1-8422-4ed2-9392-95e6b88ef5bf" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "device(type='cpu')" + ] + }, + "metadata": {}, + "execution_count": 53 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZqalUGW5N93K" + }, + "source": [ + "Wonderful, looks like the model's on the CPU by default.\n", + "\n", + "Let's change it to be on the GPU (if it's available)." + ] + }, + { + "cell_type": "code", + "source": [ + "# Set model to GPU if it's availalble, otherwise it'll default to CPU\n", + "model_1.to(device) # the device variable was set above to be \"cuda\" if available or \"cpu\" if not\n", + "next(model_1.parameters()).device" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "lg9rJAWTer1q", + "outputId": "fbeeca64-0fd3-407a-aa57-ff932550755c" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "device(type='cuda', index=0)" + ] + }, + "metadata": {}, + "execution_count": 54 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qHs0bL5_Oc1k" + }, + "source": [ + "Nice! Because of our device agnostic code, the above cell will work regardless of whether a GPU is available or not.\n", + "\n", + "If you do have access to a CUDA-enabled GPU, you should see an output of something like:\n", + "\n", + "```\n", + "device(type='cuda', index=0)\n", + "```" + ] + }, + { + "cell_type": "code", + "source": [ + "next(model_1.parameters()).device" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "61_sQEHKkQrS", + "outputId": "76e18009-b255-4b8b-a568-40c7aa6b5095" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "device(type='cuda', index=0)" + ] + }, + "metadata": {}, + "execution_count": 55 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jwTeP_vkec08" + }, + "source": [ + "### 6.3 Training" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vPFOV3wUec09" + }, + "source": [ + "Time to build a training and testing loop.\n", + "\n", + "First we'll need a loss function and an optimizer.\n", + "\n", + "Let's use the same functions we used earlier, `nn.L1Loss()` and `torch.optim.SGD()`.\n", + "\n", + "We'll have to pass the new model's parameters (`model.parameters()`) to the optimizer for it to adjust them during training. \n", + "\n", + "The learning rate of `0.1` worked well before too so let's use that again.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "source": [ + "# Create loss function\n", + "loss_fn = nn.L1Loss()\n", + "\n", + "# Create Optimizer\n", + "optimizer = torch.optim.SGD(params = model_1.parameters(),\n", + " lr = 0.01)" + ], + "metadata": { + "id": "zFOL0zyYfF6p" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NxuBdoWRP2nU" + }, + "source": [ + "Beautiful, loss function and optimizer ready, now let's train and evaluate our model using a training and testing loop.\n", + "\n", + "The only different thing we'll be doing in this step compared to the previous training loop is putting the data on the target `device`.\n", + "\n", + "If you need a reminder of the PyTorch training loop steps, see below.\n", + "\n", + "\n", + "
\n", + " PyTorch training loop steps\n", + "
    \n", + "
  1. Forward pass - The model goes through all of the training data once, performing its\n", + " forward() function\n", + " calculations (model(x_train)).\n", + "
  2. \n", + "
  3. Calculate the loss - The model's outputs (predictions) are compared to the ground truth and evaluated\n", + " to see how\n", + " wrong they are (loss = loss_fn(y_pred, y_train).
  4. \n", + "
  5. Zero gradients - The optimizers gradients are set to zero (they are accumulated by default) so they\n", + " can be\n", + " recalculated for the specific training step (optimizer.zero_grad()).
  6. \n", + "
  7. Perform backpropagation on the loss - Computes the gradient of the loss with respect for every model\n", + " parameter to\n", + " be updated (each parameter\n", + " with requires_grad=True). This is known as backpropagation, hence \"backwards\"\n", + " (loss.backward()).
  8. \n", + "
  9. Step the optimizer (gradient descent) - Update the parameters with requires_grad=True\n", + " with respect to the loss\n", + " gradients in order to improve them (optimizer.step()).
  10. \n", + "
\n", + "
" + ] + }, + { + "cell_type": "code", + "source": [ + "torch.manual_seed(42)\n", + "\n", + "# Set the number of epochs\n", + "epochs = 100\n", + "\n", + "# Put data on the available device\n", + "# Without this, error will happen (not all model/data on device)\n", + "X_train = X_train.to(device)\n", + "X_test = X_test.to(device)\n", + "y_train = y_train.to(device)\n", + "y_test = y_test.to(device)\n", + "\n", + "for epoch in range(epochs):\n", + " ### Training\n", + " model_1.train()\n", + "\n", + " # 1. Forward Pass\n", + " y_pred = model_1(X_train)\n", + " #y_pred = model_1(X_train)\n", + "\n", + " #2. Calculate loss\n", + " loss = loss_fn(y_pred,y_train)\n", + "\n", + " # 3. Zero Grad Optimizer\n", + " optimizer.zero_grad()\n", + "\n", + " # 4. Loss Backward\n", + " loss.backward()\n", + "\n", + " # 5. Step the Optimizer \n", + " optimizer.step()\n", + "\n", + " ### Testing\n", + " model_1.eval() # put the model in evaluation mode for testing (inference)\n", + "\n", + " with torch.inference_mode():\n", + "\n", + " # 1. Forward pass\n", + " test_pred = model_1(X_test)\n", + "\n", + " # 2. Calculate the loss\n", + " test_loss = loss_fn(test_pred, y_test)\n", + "\n", + " if epoch % 100 == 0:\n", + " print(f\"Epoch: {epoch} | Train loss: {loss} | Test loss: {test_loss}\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "sjWZBSIDf4KQ", + "outputId": "74e46140-51a7-440f-b52c-96818936f2e7" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch: 0 | Train loss: 0.5551779866218567 | Test loss: 0.5739762187004089\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nt-b2Y131flk" + }, + "source": [ + "> **Note:** Due to the random nature of machine learning, you will likely get slightly different results (different loss and prediction values) depending on whether your model was trained on CPU or GPU. This is true even if you use the same random seed on either device. If the difference is large, you may want to look for errors, however, if it is small (ideally it is), you can ignore it.\n", + "\n", + "Nice! That loss looks pretty low.\n", + "\n", + "Let's check the parameters our model has learned and compare them to the original parameters we hard-coded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "TP_tFn5rec09", + "outputId": "53b6c53a-1bab-4f13-e09a-c9473200af39" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The model learned the following values for weights and bias:\n", + "OrderedDict([('linear_layer.weight', tensor([[0.6968]], device='cuda:0')),\n", + " ('linear_layer.bias', tensor([0.3025], device='cuda:0'))])\n", + "\n", + "And the original values for weights and bias are:\n", + "weights: 0.7, bias: 0.3\n" + ] + } + ], + "source": [ + "# Find our model's learned parameters\n", + "from pprint import pprint # pprint = pretty print, see: https://docs.python.org/3/library/pprint.html \n", + "print(\"The model learned the following values for weights and bias:\")\n", + "pprint(model_1.state_dict())\n", + "print(\"\\nAnd the original values for weights and bias are:\")\n", + "print(f\"weights: {weight}, bias: {bias}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rDZo0vEU1_-1" + }, + "source": [ + "Ho ho! Now that's pretty darn close to a perfect model.\n", + "\n", + "Remember though, in practice, it's rare that you'll know the perfect parameters ahead of time.\n", + "\n", + "And if you knew the parameters your model had to learn ahead of time, what would be the fun of machine learning?\n", + "\n", + "Plus, in many real-world machine learning problems, the number of parameters can well exceed tens of millions.\n", + "\n", + "I don't know about you but I'd rather write code for a computer to figure those out rather than doing it by hand." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mBR1qvqhec09" + }, + "source": [ + "### 6.4 Making predictions\n", + "\n", + "Now we've got a trained model, let's turn on it's evaluation mode and make some predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ksqG5N5Iec09", + "outputId": "a0d4a51f-e1d9-4038-fd8a-0bbf4386f36a" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([[0.8600],\n", + " [0.8739],\n", + " [0.8878],\n", + " [0.9018],\n", + " [0.9157],\n", + " [0.9296],\n", + " [0.9436],\n", + " [0.9575],\n", + " [0.9714],\n", + " [0.9854]], device='cuda:0')" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Turn model into evaluation mode\n", + "model_1.eval()\n", + "\n", + "# Make predictions on the test data\n", + "with torch.inference_mode():\n", + " y_preds = model_1(X_test)\n", + "y_preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NtOoVnbi2ysL" + }, + "source": [ + "If you're making predictions with data on the GPU, you might notice the output of the above has `device='cuda:0'` towards the end. That means the data is on CUDA device 0 (the first GPU your system has access to due to zero-indexing), if you end up using multiple GPUs in the future, this number may be higher. \n", + "\n", + "Now let's plot our model's predictions.\n", + "\n", + "> **Note:** Many data science libraries such as pandas, matplotlib and NumPy aren't capable of using data that is stored on GPU. So you might run into some issues when trying to use a function from one of these libraries with tensor data not stored on the CPU. To fix this, you can call [`.cpu()`](https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html) on your target tensor to return a copy of your target tensor on the CPU." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 428 + }, + "id": "Z4dmfr2bec09", + "outputId": "dd68d5a7-1733-4385-c1cb-7d7b44085813" + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# plot_predictions(predictions=y_preds) # -> won't work... data not on CPU\n", + "\n", + "# Put data on the CPU and plot it\n", + "plot_predictions(predictions=y_preds.cpu())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DxZa-5-Tec0-" + }, + "source": [ + "Woah! Look at those red dots, they line up almost perfectly with the green dots. I guess the extra epochs helped.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K8jCHl1gec0-" + }, + "source": [ + "### 6.5 Saving and loading a model\n", + "\n", + "We're happy with our models predictions, so let's save it to file so it can be used later.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DcQo4JqL7eSU", + "outputId": "e43ada0c-c074-4b50-9207-fa01581b1d5f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Saving model to: models/01_pytorch_workflow_model_1.pth\n" + ] + } + ], + "source": [ + "from pathlib import Path\n", + "\n", + "# 1. Create models directory \n", + "MODEL_PATH = Path(\"models\")\n", + "MODEL_PATH.mkdir(parents=True, exist_ok=True)\n", + "\n", + "# 2. Create model save path \n", + "MODEL_NAME = \"01_pytorch_workflow_model_1.pth\"\n", + "MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME\n", + "\n", + "# 3. Save the model state dict \n", + "print(f\"Saving model to: {MODEL_SAVE_PATH}\")\n", + "torch.save(obj=model_1.state_dict(), # only saving the state_dict() only saves the models learned parameters\n", + " f=MODEL_SAVE_PATH) " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lk0rvpwV7slc" + }, + "source": [ + "And just to make sure everything worked well, let's load it back in.\n", + "\n", + "We'll:\n", + "* Create a new instance of the `LinearRegressionModelV2()` class\n", + "* Load in the model state dict using `torch.nn.Module.load_state_dict()`\n", + "* Send the new instance of the model to the target device (to ensure our code is device-agnostic)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "jMnVHzf1ec0-", + "outputId": "76f10046-cd42-4b39-a372-aa95227828e8" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded model:\n", + "LinearRegressionModelV2(\n", + " (linear_layer): Linear(in_features=1, out_features=1, bias=True)\n", + ")\n", + "Model on device:\n", + "cuda:0\n" + ] + } + ], + "source": [ + "# Instantiate a fresh instance of LinearRegressionModelV2\n", + "loaded_model_1 = LinearRegressionModelV2()\n", + "\n", + "# Load model state dict \n", + "loaded_model_1.load_state_dict(torch.load(MODEL_SAVE_PATH))\n", + "\n", + "# Put model to target device (if your data is on GPU, model will have to be on GPU to make predictions)\n", + "loaded_model_1.to(device)\n", + "\n", + "print(f\"Loaded model:\\n{loaded_model_1}\")\n", + "print(f\"Model on device:\\n{next(loaded_model_1.parameters()).device}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hv6EMEx99LV2" + }, + "source": [ + "Now we can evaluate the loaded model to see if its predictions line up with the predictions made prior to saving." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fYODT7ONec0_", + "outputId": "c8184cd1-595a-43e4-8155-89dcecc4d0b0" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([[True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True],\n", + " [True]], device='cuda:0')" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Evaluate loaded model\n", + "loaded_model_1.eval()\n", + "with torch.inference_mode():\n", + " loaded_model_1_preds = loaded_model_1(X_test)\n", + "y_preds == loaded_model_1_preds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7M_kcRC89YrZ" + }, + "source": [ + "Everything adds up! Nice!\n" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "provenance": [], + "include_colab_link": true + }, + "interpreter": { + "hash": "3fbe1355223f7b2ffc113ba3ade6a2b520cadace5d5ec3e828c83ce02eb221bf" + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file