diff --git a/tutorials/notebooks/mct_features_notebooks/README.md b/tutorials/notebooks/mct_features_notebooks/README.md index 062af4eef..f94a2626b 100644 --- a/tutorials/notebooks/mct_features_notebooks/README.md +++ b/tutorials/notebooks/mct_features_notebooks/README.md @@ -10,12 +10,12 @@ These techniques are essential for further optimizing models and achieving super
Post-Training Quantization (PTQ) - | Tutorial | Included Features | - |------------------------------|-----------------------------------------------------------------------------------------------------| - | [MobileNetV2](../imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb) | ✅ PTQ | - | [Mixed-Precision MobileNetV2](keras/example_keras_mobilenet_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | - | [Nanodet-Plus](../imx500_notebooks/keras/example_keras_nanodet_plus_for_imx500.ipynb) | ✅ PTQ | - | [YoloV8-nano](keras/example_keras_yolov8n.ipynb) | ✅ PTQ | + | Tutorial | Included Features | + |--------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------| + | [Basic Post-Training Quantization (PTQ)](keras/example_keras_post-training_quantization.ipynb) | ✅ PTQ | + | [MobileNetV2](../imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb) | ✅ PTQ | + | [Mixed-Precision MobileNetV2](keras/example_keras_mobilenet_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | + | [Nanodet-Plus](../imx500_notebooks/keras/example_keras_nanodet_plus_for_imx500.ipynb) | ✅ PTQ | | [EfficientDetLite0](../imx500_notebooks/keras/example_keras_effdet_lite0_for_imx500.ipynb) | ✅ PTQ
✅ [sony-custom-layers](https://github.com/sony/custom_layers) integration |
diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb index a419fda36..f558321f9 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb @@ -1,844 +1,763 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20", - "metadata": { - "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20" - }, - "source": [ - "# Activation Threshold Search Demonstration For Post-Training Quantization\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "9be59ea8-e208-4b64-aede-1dd6270b3540", - "metadata": { - "id": "9be59ea8-e208-4b64-aede-1dd6270b3540" - }, - "source": [ - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb)" - ] - }, - { - "cell_type": "markdown", - "id": "930e6d6d-4980-4d66-beed-9ff5a494acf9", - "metadata": { - "id": "930e6d6d-4980-4d66-beed-9ff5a494acf9" - }, - "source": [ - "## Overview" - ] - }, - { - "cell_type": "markdown", - "id": "699be4fd-d382-4eec-9d3f-e2e85cfb1762", - "metadata": { - "id": "699be4fd-d382-4eec-9d3f-e2e85cfb1762" - }, - "source": [ - "This tutorial demonstrates the process used to find the activation threshold, a step that MCT uses during post-training quantization.\n", - "\n", - "In this example we will explore 2 metrics for threshold selection. We will start by demonstrating how to apply the corresponding MCT configurations, then, we will feed a representative dataset through the model, plot the activation distribution of two layers with their respective MCT calculated thresholds, and finally compare the quantized model accuracy of the two methods.\n" - ] - }, - { - "cell_type": "markdown", - "id": "85199e25-c587-41b1-aaf5-e1d23ce97ca1", - "metadata": { - "id": "85199e25-c587-41b1-aaf5-e1d23ce97ca1" - }, - "source": [ - "## Activation threshold explanation" - ] - }, - { - "cell_type": "markdown", - "id": "a89a17f4-30c9-4caf-a888-424f7a82fbc8", - "metadata": { - "id": "a89a17f4-30c9-4caf-a888-424f7a82fbc8" - }, - "source": [ - "During quantization process, thresholds are used to map a distribution of 32bit float values to their quantized counterparts. Doing this with the least loss of data while maintaining the most representative range is important for final model accuracy.\n", - "\n", - "How it’s done in MCT?\n", - "\n", - "MCT's Post-training quantization uses a representative dataset to evaluate a list of typical output activation values. The challenge comes with how best to match these values to their quantized counterparts. To this end, a grid search for the optimal threshold is performed according to number of possible error metrics. Typically, mean squared error is the best performing metric and used by default.\n", - "\n", - "The error is calculated based on the difference between the float and quantized distribution. The threshold is selected based on the minimum error. For the case of MSE;\n", - "\n", - "$$\n", - "ERR(t) = \\frac{1}{n_s} \\sum_{X \\in Fl(D)} (Q(X, t, n_b) - X)^2\n", - "$$\n", - "\n", - "- $ERR(t)$ : The quantization error function dependent on threshold t.\n", - "ns: The size of the representative dataset, indicating normalization over the dataset's size.\n", - "\n", - "- $\\sum$: Summation over all elements X in the flattened dataset $Fl(D)$.\n", - "\n", - "- $F_l(D)$: The collection of activation tensors in the l-th layer, representing the dataset D flattened for processing.\n", - "\n", - "- $Q(X, t, n_b)$: The quantized approximation of X, given a threshold t and bit width nb.\n", - "\n", - "- $X$: The original activation tensor before quantization.\n", - "\n", - "- $t$: The quantization threshold, a critical parameter for controlling the quantization process.\n", - "\n", - "- $n_b$: The number of bits used in the quantization process, affecting the model's precision and size.\n", - "\n", - "\n", - "The quantization thresholds often have limitations, typically for deployment purposes. In MCT, activation thresholds are restricted by default to **Power of Two** values only and can represent signed values within the range of (-T, T) or unsigned values within the range of (0, T). Other restriction settings are available.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "9c0e9543-d356-412f-acf1-c2ecad553e06", - "metadata": { - "id": "9c0e9543-d356-412f-acf1-c2ecad553e06" - }, - "source": [ - "### Error methods supported by MCT:\n", - "\n", - "- NOCLIPPING - Use min/max values as thresholds.\n", - "\n", - "- MSE - Use min square error for minimizing quantizationnoises.\n", - "\n", - "- MAE - Use min absolute error for minimizing quantization nose.\n", - "\n", - "- KL - Use KL-divergen ce tosgnals disb as tas o be similar as posible.\n", - "\n", - "- Lp - Use Lpsingimizing quantization noise." - ] - }, - { - "cell_type": "markdown", - "id": "04228b7c-00f1-4ded-bead-722e2a4e89a0", - "metadata": { - "id": "04228b7c-00f1-4ded-bead-722e2a4e89a0", - "tags": [] - }, - "source": [ - "## Setup" - ] - }, - { - "cell_type": "markdown", - "id": "2657cf1a-654d-45a6-b877-8bf42fc26d0d", - "metadata": { - "id": "2657cf1a-654d-45a6-b877-8bf42fc26d0d" - }, - "source": [ - "Install and import the relevant packages:\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324", - "metadata": { - "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324", - "tags": [] - }, - "outputs": [], - "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "!pip install -q mct-nightly" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19", - "metadata": { - "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19" - }, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "import keras\n", - "import model_compression_toolkit as mct\n", - "import os" - ] - }, - { - "cell_type": "markdown", - "id": "z8F-avk3azgZ", - "metadata": { - "id": "z8F-avk3azgZ" - }, - "source": [ - "Clone MCT to gain access to tutorial scripts" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e3b675cf-e1b5-4249-a581-ffb9b1c16ba1", - "metadata": { - "id": "e3b675cf-e1b5-4249-a581-ffb9b1c16ba1" - }, - "outputs": [], - "source": [ - "!git clone https://github.com/sony/model_optimization.git local_mct\n", - "!pip install -r ./local_mct/requirements.txt\n", - "import sys\n", - "sys.path.insert(0,\"./local_mct\")\n", - "import tutorials.resources.utils.keras_tutorial_tools as tutorial_tools" - ] - }, - { - "cell_type": "markdown", - "id": "0c7fed0d-cfc8-41ee-adf1-22a98110397b", - "metadata": { - "id": "0c7fed0d-cfc8-41ee-adf1-22a98110397b" - }, - "source": [ - "## Dataset" - ] - }, - { - "cell_type": "markdown", - "id": "aecde59e4c37b1da", - "metadata": { - "collapsed": false, - "id": "aecde59e4c37b1da" - }, - "source": [ - "Load ImageNet classification dataset and seperate a small representative subsection of this dataset to use for quantization." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "_ztv72uM6-UT", - "metadata": { - "id": "_ztv72uM6-UT" - }, - "outputs": [], - "source": [ - "if not os.path.isdir('imagenet'):\n", - " !mkdir imagenet\n", - " !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", - " !mv ILSVRC2012_devkit_t12.tar.gz imagenet/\n", - " !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", - " !mv ILSVRC2012_img_val.tar imagenet/" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "YVAoUjK47Zcp", - "metadata": { - "id": "YVAoUjK47Zcp" - }, - "outputs": [], - "source": [ - "import torchvision\n", - "if not os.path.isdir('imagenet/val'):\n", - " ds = torchvision.datasets.ImageNet(root='./imagenet', split='val')" - ] - }, - { - "cell_type": "markdown", - "id": "fcbb3eecae5346a9", - "metadata": { - "collapsed": false, - "id": "fcbb3eecae5346a9" - }, - "source": [ - "Here we create the representative dataset. For detail on this step see [ImageNet tutorial](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb). If you are running locally a higher fraction of the dataset can be used." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "eda9ad33-f88c-4178-8f19-bac6b2b2e97b", - "metadata": { - "id": "eda9ad33-f88c-4178-8f19-bac6b2b2e97b" - }, - "outputs": [], - "source": [ - "REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'\n", - "BATCH_SIZE = 20\n", - "fraction =0.001\n", - "model_version = 'MobileNetV2'\n", - "\n", - "preprocessor = tutorial_tools.DatasetPreprocessor(model_version=model_version)\n", - "representative_dataset_gen = preprocessor.get_representative_dataset(fraction, REPRESENTATIVE_DATASET_FOLDER, BATCH_SIZE)" - ] - }, - { - "cell_type": "markdown", - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", - "metadata": { - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5" - }, - "source": [ - "## MCT Quantization" - ] - }, - { - "cell_type": "markdown", - "id": "55edbb99-ab2f-4dde-aa74-4ddee61b2615", - "metadata": { - "id": "55edbb99-ab2f-4dde-aa74-4ddee61b2615" - }, - "source": [ - "This step we load the model and quantize with two methods of threshold error calculation: no clipping and MSE.\n", - "\n", - "No clipping chooses the lowest Power of two threshold that does not loose any data to its threshold.\n", - "\n", - "MSE chooses a Power of two threshold that results in the least difference between the float distribution and the quantized distribution.\n", - "\n", - "This means no clipping will often result in a larger threshold, which we will see later in this tutorial." - ] - }, - { - "cell_type": "markdown", - "id": "VMrcPUN6jPlB", - "metadata": { - "id": "VMrcPUN6jPlB" - }, - "source": [ - "First we load mobilenetv2 from the keras library" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c431848f-a5f4-4737-a5c8-f046a8bca840", - "metadata": { - "id": "c431848f-a5f4-4737-a5c8-f046a8bca840" - }, - "outputs": [], - "source": [ - "from keras.applications.mobilenet_v2 import MobileNetV2\n", - "float_model = MobileNetV2()" - ] - }, - { - "cell_type": "markdown", - "id": "Pd8blHyKjWay", - "metadata": { - "id": "Pd8blHyKjWay" - }, - "source": [ - "Quantization perameters are defined. Here we will use default values apart from quantisation method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ca971297-e00b-44b5-b9e1-e57ba5843e38", - "metadata": { - "id": "ca971297-e00b-44b5-b9e1-e57ba5843e38" - }, - "outputs": [], - "source": [ - "from model_compression_toolkit.core import QuantizationErrorMethod\n", - "\n", - "# Specify the IMX500-v1 target platform capability (TPC)\n", - "tpc = mct.get_target_platform_capabilities(\"tensorflow\", 'imx500', target_platform_version='v1')\n", - "\n", - "# List of error methods to iterate over\n", - "q_configs_dict = {}" - ] - }, - { - "cell_type": "markdown", - "id": "Vot-MCiWjzCE", - "metadata": { - "id": "Vot-MCiWjzCE" - }, - "source": [ - "You can edit the code below to quantize with other error metrics MCT supports." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "jtiZzXmTjxuI", - "metadata": { - "id": "jtiZzXmTjxuI" - }, - "outputs": [], - "source": [ - "# Error methods to iterate over\n", - "error_methods = [\n", - " QuantizationErrorMethod.MSE,\n", - " QuantizationErrorMethod.NOCLIPPING\n", - "]\n", - "\n", - "# If you are curious you can add any of the below quantization methods as well.\n", - "#QuantizationErrorMethod.MAE\n", - "#QuantizationErrorMethod.KL\n", - "#QuantizationErrorMethod.LP\n", - "\n", - "# Iterate and build the QuantizationConfig objects\n", - "for error_method in error_methods:\n", - " q_config = mct.core.QuantizationConfig(\n", - " activation_error_method=error_method,\n", - " )\n", - "\n", - " q_configs_dict[error_method] = q_config" - ] - }, - { - "cell_type": "markdown", - "id": "8W3Dcn0jkJOH", - "metadata": { - "id": "8W3Dcn0jkJOH" - }, - "source": [ - "Finally we quantize the model, this can take some time." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ba0c6e55-d474-4dc3-9a43-44b736635998", - "metadata": { - "id": "ba0c6e55-d474-4dc3-9a43-44b736635998" - }, - "outputs": [], - "source": [ - "quantized_models_dict = {}\n", - "\n", - "for error_method, q_config in q_configs_dict.items():\n", - " # Create a CoreConfig object with the current quantization configuration\n", - " ptq_config = mct.core.CoreConfig(quantization_config=q_config)\n", - "\n", - " # Perform MCT post-training quantization\n", - " quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", - " in_model=float_model,\n", - " representative_data_gen=representative_dataset_gen,\n", - " core_config=ptq_config,\n", - " target_platform_capabilities=tpc\n", - " )\n", - "\n", - " # Update the dictionary to include the quantized model\n", - " quantized_models_dict[error_method] = {\n", - " \"quantization_config\": q_config,\n", - " \"quantized_model\": quantized_model,\n", - " \"quantization_info\": quantization_info\n", - " }\n" - ] - }, - { - "cell_type": "markdown", - "id": "A8UHRsh2khM4", - "metadata": { - "id": "A8UHRsh2khM4" - }, - "source": [ - "## Threshold and Distribution Visulisation" - ] - }, - { - "cell_type": "markdown", - "id": "Y-0QLWFJkpFV", - "metadata": { - "id": "Y-0QLWFJkpFV" - }, - "source": [ - "To assist with understanding we will now plot for two of Mobilenet's layers. The thresholds found during quantisation for both MSE error and NoClip, along side each layers activation distribution obtained by feeding the representative dataset through the model. This is useful to help visulise the effect of different thresholds on dataloss vs data resolution during quantisation.\n", - "\n", - "MCT quantization_info stores threshold data per layer. However, to see the distribution of the activations the model needs to be rebuilt upto and including the layer chosen for distribution visulisation.\n", - "\n", - "To do this we first need to list the layer names. With keras this can be done easily for the first 10 layes with the following." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac", - "metadata": { - "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac" - }, - "outputs": [], - "source": [ - "for index, layer in enumerate(float_model.layers):\n", - " if index < 10:\n", - " print(layer.name)\n", - " else:\n", - " break" - ] - }, - { - "cell_type": "markdown", - "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277", - "metadata": { - "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277" - }, - "source": [ - "First activation layer in model is 'Conv1_relu'.\n", - "\n", - "For this particular model, through testing we found that expanded_conv_project_BN shows differing thresholds for the two error metrics. So, this layer will also be visulised. For some context, MobileNetv2 uses an inverted residual structure where the input is expanded in the channel dimension, passed through a depthwise conv, and finally projected back to to a lower dimension. expanded_conv_project_BN layer represents this projection and the BN indicates Batch Normalisation.\n", - "\n", - "Use these layer names to create a pair of models that end in these respective layers." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d", - "metadata": { - "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d" - }, - "outputs": [], - "source": [ - "from tensorflow.keras.models import Model\n", - "layer_name1 = 'Conv1_relu'\n", - "layer_name2 = 'expanded_conv_project_BN'\n", - "\n", - "layer_output1 = float_model.get_layer(layer_name1).output\n", - "activation_model_relu = Model(inputs=float_model.input, outputs=layer_output1)\n", - "layer_output2 = float_model.get_layer(layer_name2).output\n", - "activation_model_project = Model(inputs=float_model.input, outputs=layer_output2)" - ] - }, - { - "cell_type": "markdown", - "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364", - "metadata": { - "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364" - }, - "source": [ - "Feed the representative dataset through these models and store the output." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "eaeb9888-5d67-4979-af50-80781a811b4b", - "metadata": { - "id": "eaeb9888-5d67-4979-af50-80781a811b4b" - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "activation_batches_relu = []\n", - "activation_batches_project = []\n", - "for images in representative_dataset_gen():\n", - " activations_relu = activation_model_relu.predict(images)\n", - " activation_batches_relu.append(activations_relu)\n", - " activations_project = activation_model_project.predict(images)\n", - " activation_batches_project.append(activations_project)\n", - "\n", - "all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()\n", - "all_activations_project = np.concatenate(activation_batches_project, axis=0).flatten()" - ] - }, - { - "cell_type": "markdown", - "id": "I5W9yY5DvOFr", - "metadata": { - "id": "I5W9yY5DvOFr" - }, - "source": [ - "Thresholds calculated by MCT during quantization can be accessed using the following. The layer number matches the index of the layers named in the previous steps.\n", - "\n", - "As mentioned above we use the first activation relu layer and the batch normalisation layer as they best demonstrate the effect of the two threshold error methods." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "NGnjrPD_uTd5", - "metadata": { - "id": "NGnjrPD_uTd5" - }, - "outputs": [], - "source": [ - "# layer 4 is the first activation layer - Conv1_relu\n", - "layer_name2 = 'expanded_conv_project_BN'\n", - "optimal_thresholds_relu = {\n", - " error_method: data[\"quantized_model\"].layers[4].activation_holder_quantizer.get_config()['threshold'][0]\n", - " for error_method, data in quantized_models_dict.items()\n", - "}\n", - "\n", - "# layer 9 is the batch normalisation projection layer - Expanded_conv_project_BN\n", - "optimal_thresholds_project = {\n", - " error_method: data[\"quantized_model\"].layers[9].activation_holder_quantizer.get_config()['threshold'][0]\n", - " for error_method, data in quantized_models_dict.items()\n", - "}" - ] - }, - { - "cell_type": "markdown", - "id": "XRAr8L5mvuLd", - "metadata": { - "id": "XRAr8L5mvuLd" - }, - "source": [ - "### Distribution Plots\n", - "\n", - "These are the distributions of the two layers firstly, below relu and secondly Project_BN.\n", - "\n", - "The second distribution shows distinctly the difference in the result of the two error metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "VPb8tBNGpJjo", - "metadata": { - "id": "VPb8tBNGpJjo" - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "\n", - "# Plotting\n", - "plt.figure(figsize=(10, 6))\n", - "plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Original')\n", - "for method, threshold in optimal_thresholds_relu.items():\n", - " plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')\n", - "\n", - "plt.title('Activation Distribution with Optimal Quantization Thresholds First Relu Layer')\n", - "plt.xlabel('Activation Value')\n", - "plt.ylabel('Frequency')\n", - "plt.legend()\n", - "plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "Df7eKzh4oj5X", - "metadata": { - "id": "Df7eKzh4oj5X" - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "\n", - "# Plotting\n", - "plt.figure(figsize=(10, 6))\n", - "plt.hist(all_activations_project, bins=100, alpha=0.5, label='Original')\n", - "for method, threshold in optimal_thresholds_project.items():\n", - " plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')\n", - "\n", - "plt.title('Activation Distribution with Optimal Quantization Thresholds Prohject BN layer')\n", - "plt.xlabel('Activation Value')\n", - "plt.ylabel('Frequency')\n", - "plt.legend()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "id": "4c967d41-439d-405b-815f-be641f1768fe", - "metadata": { - "id": "4c967d41-439d-405b-815f-be641f1768fe" - }, - "source": [ - "## Accuracy\n", - "\n", - "Finally we can show the effect of these different thresholds on the models accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "092d9fd0-8005-4551-b853-3b52840639c2", - "metadata": { - "id": "092d9fd0-8005-4551-b853-3b52840639c2" - }, - "outputs": [], - "source": [ - "test_dataset_folder = './imagenet/val'\n", - "batch_size=50\n", - "evaluation_dataset = preprocessor.get_validation_dataset_fraction(0.005, test_dataset_folder, batch_size)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8ebf7d04-7816-465c-9157-6068c0a4a08a", - "metadata": { - "id": "8ebf7d04-7816-465c-9157-6068c0a4a08a" - }, - "outputs": [], - "source": [ - "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[\"accuracy\"])\n", - "results = float_model.evaluate(evaluation_dataset)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613", - "metadata": { - "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613" - }, - "outputs": [], - "source": [ - "evaluation_results = {}\n", - "\n", - "for error_method, data in quantized_models_dict.items():\n", - " quantized_model = data[\"quantized_model\"]\n", - "\n", - " quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[\"accuracy\"])\n", - "\n", - " results = quantized_model.evaluate(evaluation_dataset, verbose=0) # Set verbose=0 to suppress the log messages\n", - "\n", - " evaluation_results[error_method] = results\n", - "\n", - " # Print the results\n", - " print(f\"Results for {error_method}: Loss = {results[0]}, Accuracy = {results[1]}\")" - ] - }, - { - "cell_type": "markdown", - "id": "GpEZ2E1qzWl3", - "metadata": { - "id": "GpEZ2E1qzWl3" - }, - "source": [ - "These results mirror the case for many models hence why MSE has been chosen by default by the MCT team.\n", - "\n", - "Each of MCT's error methods have a different effect on different models so it is always worth including this metric into hyper perameter tuning when trying to improve quantized model accuracy." - ] - }, - { - "cell_type": "markdown", - "id": "14877777", - "metadata": { - "id": "14877777" - }, - "source": [ - "## Conclusion" - ] - }, - { - "cell_type": "markdown", - "id": "bb7e1572", - "metadata": { - "id": "bb7e1572" - }, - "source": [ - "In this tutorial, we demonstrated the methods used to find a layers quantization threshold for activation. The process is similar for weight quantization but a representative dataset is not required. Use this code to assist with choosing error methods for your own model.\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "8c0c9b61-8056-4d06-8a2b-6e5fc56325f6", - "metadata": { - "id": "8c0c9b61-8056-4d06-8a2b-6e5fc56325f6" - }, - "source": [ - "## Appendix\n", - "\n", - "Some code to assist with gaining information from each layer in the MCT quanisation output." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "qml4LLmWZLP4", - "metadata": { - "id": "qml4LLmWZLP4" - }, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "import inspect\n", - "\n", - "\n", - "quantized_model = data[\"quantized_model\"]\n", - "quantizer_object = quantized_model.layers[1]\n", - "\n", - "quantized_model = data[\"quantized_model\"]\n", - "\n", - "\n", - "relu_layer_indices = []\n", - "\n", - "\n", - "for i, layer in enumerate(quantized_model.layers):\n", - " # Convert the layer's configuration to a string\n", - " layer_config_str = str(layer.get_config())\n", - "\n", - " layer_class_str = str(layer.__class__.__name__)\n", - "\n", - " # Check if \"relu\" is mentioned in the layer's configuration or class name\n", - " if 'relu' in layer_config_str.lower() or 'relu' in layer_class_str.lower():\n", - " relu_layer_indices.append(i)\n", - "\n", - "print(\"Layer indices potentially using ReLU:\", relu_layer_indices)\n", - "print(\"Number of relu layers \" + str(len(relu_layer_indices)))\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "43f34133-8ed4-429a-a225-6fb6a6f5b207", - "metadata": { - "id": "43f34133-8ed4-429a-a225-6fb6a6f5b207" - }, - "outputs": [], - "source": [ - "for error_method, data in quantized_models_dict.items():\n", - " quantized_model = data[\"quantized_model\"]\n", - " print(quantized_model.layers[1])" - ] - }, - { - "cell_type": "markdown", - "id": "01c1645e-205c-4d9a-8af3-e497b3addec1", - "metadata": { - "id": "01c1645e-205c-4d9a-8af3-e497b3addec1" - }, - "source": [ - "\n", - "\n", - "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", - "\n", - "Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "you may not use this file except in compliance with the License.\n", - "You may obtain a copy of the License at\n", - "\n", - " http://www.apache.org/licenses/LICENSE-2.0\n", - "\n", - "Unless required by applicable law or agreed to in writing, software\n", - "distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "See the License for the specific language governing permissions and\n", - "limitations under the License.\n" - ] - } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 + "cells": [ + { + "cell_type": "markdown", + "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20", + "metadata": { + "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20" + }, + "source": [ + "# A Practical Guide to Activation Threshold Search in Post-Training Quantization\n", + "\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb)\n", + "\n", + "## Overview\n", + "This tutorial demonstrates how to find the optimal activation threshold, a key component in MCT's post-training quantization workflow.\n", + "\n", + "In this example, we will explore two different metrics for threshold selection. We will begin by applying the appropriate MCT configurations, followed by inferring a representative dataset through the model. Next, we will plot the activation distributions of two layers along with their corresponding MCT-calculated thresholds, and finally, we will compare the quantized model accuracy using both methods.\n", + "\n", + "## Activation threshold explanation\n", + "During the quantization process, thresholds are used to map a distribution of 32-bit floating-point values to their quantized equivalents. Minimizing data loss while preserving the most representative range is crucial for maintaining the final model's accuracy.\n", + "\n", + "### How Is It Done in MCT?\n", + "\n", + "MCT's post-training quantization leverages a representative dataset to evaluate a range of typical output activation values. The challenge lies in determining the best way to map these values to their quantized versions. To address this, a grid search is performed to find the optimal threshold using various error metrics. Typically, mean squared error (MSE) is the most effective and is used as the default metric.\n", + "\n", + "The error is calculated based on the difference between the original float and the quantized distributions. The optimal threshold is then selected based on the metric that results in the minimum error. For example, for the case of MSE.\n", + "\n", + "$$\n", + "ERR(t) = \\frac{1}{n_s} \\sum_{X \\in Fl(D)} (Q(X, t, n_b) - X)^2\n", + "$$\n", + "\n", + "- $ERR(t)$ : The quantization error function dependent on the threshold $t$.\n", + "- \n", + "- $n_s$: The size of the representative dataset.\n", + "\n", + "- $\\sum$: Summation over all elements $X$ in the flattened dataset $F_l(D)$.\n", + "\n", + "- $F_l(D)$: The set of activation tensors in the $l$-th layer, flattened for processing.\n", + "\n", + "- $Q(X, t, n_b)$: The quantized approximation of $X$, given a threshold $t$ and bit width $n_b$.\n", + "\n", + "- $X$: The original activation tensor before quantization.\n", + "\n", + "- $t$: The quantization threshold, a key parameter for controlling the quantization process.\n", + "\n", + "- $n_b$: The number of bits used in quantization, impacting model precision and size.\n", + "\n", + "\n", + "Quantization thresholds often have specific limitations, typically imposed for deployment purposes. In MCT, activation thresholds are restricted by default to Power-of-Two values and can represent either signed values within the range $(-T, T)$ or unsigned values within $(0, T)$. Other restriction settings are also configurable.\n", + "\n", + "### Error methods supported by MCT:\n", + "\n", + "- **NOCLIPPING:** Use min/max values as thresholds.\n", + "\n", + "- **MSE:** Minimizes quantization noise by using the mean squared error (MSE).\n", + "\n", + "- **MAE:** Minimizes quantization noise by using the mean absolute error (MAE).\n", + "\n", + "- **KL:** Uses Kullback-Leibler (KL) divergence to align the distributions, ensuring that the quantized distribution is as similar as possible to the original.\n", + "\n", + "- **Lp:** Minimizes quantization noise using the Lp norm, where `p` is a configurable parameter that determines the type of distance metric.\n", + "\n", + "## Setup\n", + "Install the relevant packages:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324", + "metadata": { + "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324", + "tags": [] + }, + "outputs": [], + "source": [ + "TF_VER = '2.14'\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + }, + "id": "7837babf2112542b" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19", + "metadata": { + "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19" + }, + "outputs": [], + "source": [ + "import keras\n", + "import tensorflow as tf" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." + ], + "metadata": { + "collapsed": false + }, + "id": "4d691159f5bfc53e" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "\n", + "float_model = MobileNetV2()" + ], + "metadata": { + "collapsed": false + }, + "id": "468d67cd5f25886e" + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset preparation\n", + "### Download the ImageNet validation set\n", + "Download the ImageNet dataset with only the validation split.\n", + "**Note:** For demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", + "\n", + "This step may take several minutes..." + ], + "metadata": { + "collapsed": false + }, + "id": "de5a1be0c4fc4847" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "_ztv72uM6-UT", + "metadata": { + "id": "_ztv72uM6-UT" + }, + "outputs": [], + "source": [ + "import os\n", + " \n", + "if not os.path.isdir('imagenet'):\n", + " !mkdir imagenet\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", + " \n", + " !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \\\n", + " mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val" + ] + }, + { + "cell_type": "markdown", + "source": [ + "The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders." + ], + "metadata": { + "collapsed": false + }, + "id": "ca398ea7e1551d7" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "YVAoUjK47Zcp", + "metadata": { + "id": "YVAoUjK47Zcp" + }, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import shutil\n", + "\n", + "root = Path('./imagenet')\n", + "imgs_dir = root / 'ILSVRC2012_img_val'\n", + "target_dir = root /'val'\n", + "\n", + "def extract_labels():\n", + " !pip install -q scipy\n", + " import scipy\n", + " mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)\n", + " cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} \n", + " with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:\n", + " return [cls_to_nid[int(cls)] for cls in f.readlines()]\n", + "\n", + "if not target_dir.exists():\n", + " labels = extract_labels()\n", + " for lbl in set(labels):\n", + " os.makedirs(target_dir / lbl)\n", + " \n", + " for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):\n", + " shutil.move(imgs_dir / img_file, target_dir / lbl)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "These functions generate a `tf.data.Dataset` from image files in a directory." + ], + "metadata": { + "collapsed": false + }, + "id": "a0bb1a9df8e1d7fc" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def imagenet_preprocess_input(images, labels):\n", + " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels\n", + "\n", + "def get_dataset(batch_size, shuffle):\n", + " dataset = tf.keras.utils.image_dataset_from_directory(\n", + " directory='./imagenet/val',\n", + " batch_size=batch_size,\n", + " image_size=[224, 224],\n", + " shuffle=shuffle,\n", + " crop_to_aspect_ratio=True,\n", + " interpolation='bilinear')\n", + " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)\n", + " dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)\n", + " return dataset" + ], + "metadata": { + "collapsed": false + }, + "id": "c8acd6413a722c2f" + }, + { + "cell_type": "markdown", + "source": [ + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:" + ], + "metadata": { + "collapsed": false + }, + "id": "8aa0bca3e15fba91" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "batch_size = 32\n", + "n_iter = 10\n", + "\n", + "dataset = get_dataset(batch_size, shuffle=True)\n", + "\n", + "def representative_dataset_gen():\n", + " for _ in range(n_iter):\n", + " yield [dataset.take(1).get_single_element()[0].numpy()]" + ], + "metadata": { + "collapsed": false + }, + "id": "1bdb4144e4ce2ab6" + }, + { + "cell_type": "markdown", + "source": [ + "## Target Platform Capabilities\n", + "MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Tensorflow TPC:" + ], + "metadata": { + "collapsed": false + }, + "id": "98f4bbca00996989" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import model_compression_toolkit as mct\n", + "\n", + "# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Keras layers representation.\n", + "target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default')" + ], + "metadata": { + "collapsed": false + }, + "id": "554719effaf90250" + }, + { + "cell_type": "markdown", + "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", + "metadata": { + "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5" + }, + "source": [ + "## Post-Training Quantization using MCT\n", + "In this step, we load the model and apply post-training quantization using two threshold error calculation methods: **\"No Clipping\"** and **MSE**.\n", + "\n", + "- **\"No Clipping\"** selects the lowest power-of-two threshold that ensures no data is lost (clipped).\n", + "- **MSE** selects a power-of-two threshold that minimizes the mean square error between the original float distribution and the quantized distribution.\n", + "\n", + "- As a result, the \"No Clipping\" method typically results in a larger threshold, as we will demonstrate later in this tutorial.\n", + "\n", + "The quantization parameters are predefined, and we use the default values except for the quantization method. Feel free to modify the code below to experiment with other error metrics supported by MCT." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "jtiZzXmTjxuI", + "metadata": { + "id": "jtiZzXmTjxuI" + }, + "outputs": [], + "source": [ + "from model_compression_toolkit.core import QuantizationErrorMethod\n", + "\n", + "q_configs_dict = {}\n", + "# Error methods to iterate over\n", + "error_methods = [\n", + " QuantizationErrorMethod.MSE,\n", + " QuantizationErrorMethod.NOCLIPPING\n", + "]\n", + "\n", + "# If you are curious you can add any of the below quantization methods as well.\n", + "# QuantizationErrorMethod.MAE\n", + "# QuantizationErrorMethod.KL\n", + "# QuantizationErrorMethod.LP\n", + "\n", + "# Iterate and build the QuantizationConfig objects\n", + "for error_method in error_methods:\n", + " q_config = mct.core.QuantizationConfig(\n", + " activation_error_method=error_method,\n", + " )\n", + "\n", + " q_configs_dict[error_method] = q_config" + ] + }, + { + "cell_type": "markdown", + "id": "8W3Dcn0jkJOH", + "metadata": { + "id": "8W3Dcn0jkJOH" + }, + "source": [ + "Now we will run post-training quantization for each configuration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba0c6e55-d474-4dc3-9a43-44b736635998", + "metadata": { + "id": "ba0c6e55-d474-4dc3-9a43-44b736635998" + }, + "outputs": [], + "source": [ + "quantized_models_dict = {}\n", + "\n", + "for error_method, q_config in q_configs_dict.items():\n", + " # Create a CoreConfig object with the current quantization configuration\n", + " ptq_config = mct.core.CoreConfig(quantization_config=q_config)\n", + "\n", + " # Perform MCT post-training quantization\n", + " quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", + " in_model=float_model,\n", + " representative_data_gen=representative_dataset_gen,\n", + " core_config=ptq_config,\n", + " target_platform_capabilities=target_platform_cap\n", + " )\n", + "\n", + " # Update the dictionary to include the quantized model\n", + " quantized_models_dict[error_method] = {\n", + " \"quantization_config\": q_config,\n", + " \"quantized_model\": quantized_model,\n", + " \"quantization_info\": quantization_info\n", + " }\n" + ] + }, + { + "cell_type": "markdown", + "id": "A8UHRsh2khM4", + "metadata": { + "id": "A8UHRsh2khM4" + }, + "source": [ + "## Threshold and Distribution Visualization\n", + "To facilitate understanding, we will plot the activation distributions for two layers of MobileNetV2. For each layer, we will show the thresholds determined by both **MSE** and **No Clipping** methods, along with the corresponding activation distributions obtained by infering the representative dataset through the model. This visualization highlights the trade-off between data loss and data resolution under different thresholds during quantization.\n", + "\n", + "MCT’s `quantization_info` stores the threshold values for each layer. However, to view the actual activation distributions, the model needs to be reconstructed up to and including the target layer selected for visualization.\n", + "\n", + "To do this, we first need to identify the layer names. In Keras, this can be easily done for the first 10 layers using the following code snippet." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac", + "metadata": { + "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac" + }, + "outputs": [], + "source": [ + "for index, layer in enumerate(float_model.layers):\n", + " if index < 10:\n", + " print(layer.name)\n", + " else:\n", + " break" + ] + }, + { + "cell_type": "markdown", + "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277", + "metadata": { + "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277" + }, + "source": [ + "The first activation layer in the model is named `Conv1_relu`.\n", + "\n", + "For this particular model, testing has shown that the `expanded_conv_project_BN` layer exhibits different thresholds for the two error metrics. Therefore, we will also include this layer in the visualization. For context, MobileNetV2 uses an inverted residual structure, where the input is first expanded in the channel dimension, then passed through a depthwise convolution, and finally projected back to a lower dimension. The `expanded_conv_project_BN` layer represents this projection, and the BN suffix indicates the presence of Batch Normalization.\n", + "\n", + "We will use these layer names to create two separate models, each ending at one of these respective layers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d", + "metadata": { + "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d" + }, + "outputs": [], + "source": [ + "from tensorflow.keras.models import Model\n", + "layer_name1 = 'Conv1_relu'\n", + "layer_name2 = 'expanded_conv_project_BN'\n", + "\n", + "layer_output1 = float_model.get_layer(layer_name1).output\n", + "activation_model_relu = Model(inputs=float_model.input, outputs=layer_output1)\n", + "layer_output2 = float_model.get_layer(layer_name2).output\n", + "activation_model_project = Model(inputs=float_model.input, outputs=layer_output2)" + ] + }, + { + "cell_type": "markdown", + "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364", + "metadata": { + "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364" + }, + "source": [ + "Infer the representative dataset using these models and store the outputs for further analysis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eaeb9888-5d67-4979-af50-80781a811b4b", + "metadata": { + "id": "eaeb9888-5d67-4979-af50-80781a811b4b" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "activation_batches_relu = []\n", + "activation_batches_project = []\n", + "for images in representative_dataset_gen():\n", + " activations_relu = activation_model_relu.predict(images)\n", + " activation_batches_relu.append(activations_relu)\n", + " activations_project = activation_model_project.predict(images)\n", + " activation_batches_project.append(activations_project)\n", + "\n", + "all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()\n", + "all_activations_project = np.concatenate(activation_batches_project, axis=0).flatten()" + ] + }, + { + "cell_type": "markdown", + "id": "I5W9yY5DvOFr", + "metadata": { + "id": "I5W9yY5DvOFr" + }, + "source": [ + "Thresholds calculated by MCT during quantization can be accessed using the following approach. The layer indices correspond to the order of the layers listed in the previous steps.\n", + "\n", + "As noted earlier, we focus on the first ReLU activation layer and the Batch Normalization layer (`expanded_conv_project_BN`) since they effectively illustrate the impact of the two threshold error methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "NGnjrPD_uTd5", + "metadata": { + "id": "NGnjrPD_uTd5" + }, + "outputs": [], + "source": [ + "# layer 4 is the first activation layer - Conv1_relu\n", + "layer_name2 = 'expanded_conv_project_BN'\n", + "optimal_thresholds_relu = {\n", + " error_method: data[\"quantized_model\"].layers[4].activation_holder_quantizer.get_config()['threshold'][0]\n", + " for error_method, data in quantized_models_dict.items()\n", + "}\n", + "\n", + "# layer 9 is the batch normalisation projection layer - Expanded_conv_project_BN\n", + "optimal_thresholds_project = {\n", + " error_method: data[\"quantized_model\"].layers[9].activation_holder_quantizer.get_config()['threshold'][0]\n", + " for error_method, data in quantized_models_dict.items()\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "XRAr8L5mvuLd", + "metadata": { + "id": "XRAr8L5mvuLd" + }, + "source": [ + "### Distribution Plots\n", + "Below are the activation distributions for the two selected layers: first, the ReLU activation layer, `Conv1_relu`, followed by the `expanded_conv_project_BN` layer.\n", + "\n", + "The second distribution clearly highlights the differences between the two error metrics, showing the impact of each on the resulting quantization threshold." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "VPb8tBNGpJjo", + "metadata": { + "id": "VPb8tBNGpJjo" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Plotting\n", + "plt.figure(figsize=(10, 6))\n", + "plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Original')\n", + "for method, threshold in optimal_thresholds_relu.items():\n", + " plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')\n", + "\n", + "plt.title('Activation Distribution with Optimal Quantization Thresholds First Relu Layer')\n", + "plt.xlabel('Activation Value')\n", + "plt.ylabel('Frequency')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Df7eKzh4oj5X", + "metadata": { + "id": "Df7eKzh4oj5X" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Plotting\n", + "plt.figure(figsize=(10, 6))\n", + "plt.hist(all_activations_project, bins=100, alpha=0.5, label='Original')\n", + "for method, threshold in optimal_thresholds_project.items():\n", + " plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')\n", + "\n", + "plt.title('Activation Distribution with Optimal Quantization Thresholds Project BN layer')\n", + "plt.xlabel('Activation Value')\n", + "plt.ylabel('Frequency')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "4c967d41-439d-405b-815f-be641f1768fe", + "metadata": { + "id": "4c967d41-439d-405b-815f-be641f1768fe" + }, + "source": [ + "## Model Evaluation\n", + "Finally, we can demonstrate the impact of these different thresholds on the model's overall accuracy.\n", + "In order to evaluate our models, we first need to load the validation dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "val_dataset = get_dataset(batch_size=50, shuffle=False)" + ], + "metadata": { + "collapsed": false + }, + "id": "9199b59c4f10eca1" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", + "float_accuracy = float_model.evaluate(val_dataset)\n", + "print(f\"Float model's Top 1 accuracy on the Imagenet validation set: {(float_accuracy[1] * 100):.2f}%\")" + ], + "metadata": { + "collapsed": false + }, + "id": "631780a79e2cedf0" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613", + "metadata": { + "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613" + }, + "outputs": [], + "source": [ + "evaluation_results = {}\n", + "\n", + "for error_method, data in quantized_models_dict.items():\n", + " quantized_model = data[\"quantized_model\"]\n", + "\n", + " quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[\"accuracy\"])\n", + "\n", + " results = quantized_model.evaluate(val_dataset, verbose=0) # Set verbose=0 to suppress the log messages\n", + "\n", + " evaluation_results[error_method] = results\n", + "\n", + " # Print the results\n", + " print(f\"Results for {error_method}: Loss = {results[0]}, Accuracy = {results[1]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "GpEZ2E1qzWl3", + "metadata": { + "id": "GpEZ2E1qzWl3" + }, + "source": [ + "These results are consistent across many models, which is why MSE is set as the default method.\n", + "\n", + "Each of MCT's error methods impacts models differently, so it is recommended to include this metric as part of hyperparameter tuning when optimizing quantized model accuracy.\n", + "\n", + "\n", + "## Conclusion\n", + "In this tutorial, we explored the process of finding optimal activation thresholds using different error metrics in MCT’s post-training quantization workflow. By comparing the **MSE** and **No Clipping** methods, we demonstrated how the choice of threshold can significantly affect the activation distributions and, ultimately, the quantized model’s performance. While **MSE** is commonly the best choice and is used by default, it is essential to consider other error metrics during hyperparameter tuning to achieve the best results for different models.\n", + "\n", + "Understanding the impact of these thresholds on data loss and resolution is critical when fine-tuning the quantization process for deployment, making this a valuable step in building high-performance quantized models.\n", + "\n", + "\n", + "## Appendix\n", + "Below is a code snippet that can be used to extract information from each layer in the MCT quantization output, assisting in analyzing the layer-wise quantization details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "qml4LLmWZLP4", + "metadata": { + "id": "qml4LLmWZLP4" + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "\n", + "quantized_model = data[\"quantized_model\"]\n", + "quantizer_object = quantized_model.layers[1]\n", + "\n", + "quantized_model = data[\"quantized_model\"]\n", + "\n", + "\n", + "relu_layer_indices = []\n", + "\n", + "\n", + "for i, layer in enumerate(quantized_model.layers):\n", + " # Convert the layer's configuration to a string\n", + " layer_config_str = str(layer.get_config())\n", + "\n", + " layer_class_str = str(layer.__class__.__name__)\n", + "\n", + " # Check if \"relu\" is mentioned in the layer's configuration or class name\n", + " if 'relu' in layer_config_str.lower() or 'relu' in layer_class_str.lower():\n", + " relu_layer_indices.append(i)\n", + "\n", + "print(\"Layer indices potentially using ReLU:\", relu_layer_indices)\n", + "print(\"Number of relu layers \" + str(len(relu_layer_indices)))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43f34133-8ed4-429a-a225-6fb6a6f5b207", + "metadata": { + "id": "43f34133-8ed4-429a-a225-6fb6a6f5b207" + }, + "outputs": [], + "source": [ + "for error_method, data in quantized_models_dict.items():\n", + " quantized_model = data[\"quantized_model\"]\n", + " print(quantized_model.layers[1])" + ] + }, + { + "cell_type": "markdown", + "id": "01c1645e-205c-4d9a-8af3-e497b3addec1", + "metadata": { + "id": "01c1645e-205c-4d9a-8af3-e497b3addec1" + }, + "source": [ + "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License.\n" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 } diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb index e57bf7a08..acbf5a1aa 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb @@ -1,847 +1,777 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20", - "metadata": { - "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20" - }, - "source": [ - "# Activation Z-Score Threshold Demonstration For Post-Training Quantization\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "9be59ea8-e208-4b64-aede-1dd6270b3540", - "metadata": { - "id": "9be59ea8-e208-4b64-aede-1dd6270b3540" - }, - "source": [ - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb)" - ] - }, - { - "cell_type": "markdown", - "id": "930e6d6d-4980-4d66-beed-9ff5a494acf9", - "metadata": { - "id": "930e6d6d-4980-4d66-beed-9ff5a494acf9" - }, - "source": [ - "## Overview" - ] - }, - { - "cell_type": "markdown", - "id": "699be4fd-d382-4eec-9d3f-e2e85cfb1762", - "metadata": { - "id": "699be4fd-d382-4eec-9d3f-e2e85cfb1762" - }, - "source": [ - "This tutorial demonstrates the process used to find the activation z-score threshold, a step that MCT can use during post-training quantization.\n", - "\n", - "In this example we will explore how setting different z scores effects threshold and accuracy. We will start by demonstrating how to apply the corresponding MCT configurations, then, we will feed a representative dataset through the model, plot the activation distribution of an activation layer with their respective MCT calculated z-score thresholds, and finally compare the quantized model accuracy of the examples of different z-score.\n" - ] - }, - { - "cell_type": "markdown", - "id": "85199e25-c587-41b1-aaf5-e1d23ce97ca1", - "metadata": { - "id": "85199e25-c587-41b1-aaf5-e1d23ce97ca1" - }, - "source": [ - "## Activation threshold explanation" - ] - }, - { - "cell_type": "markdown", - "id": "a89a17f4-30c9-4caf-a888-424f7a82fbc8", - "metadata": { - "id": "a89a17f4-30c9-4caf-a888-424f7a82fbc8" - }, - "source": [ - "During quantization process, thresholds are used to map a distribution of 32-bit float values to their quantized counterparts. Doing this with the least loss of data while maintaining the most representative range is important for final model accuracy.\n", - "\n", - "Some models exhibit anomolus values when fed a representative dataset. It is in the interest of the models accuracy to remove these values so that the quantization threshold results in a more reliable range mapping.\n", - "\n", - "MCT has the option to remove these using z-score thresholding. Allowing the user to remove data based on standard distributions.\n", - "\n", - "The Z-score of a value is calculated by subtracting the mean of the dataset from the value and then dividing by the standard deviation of the dataset. This measures how many standard deviations an element is from the mean.\n", - "\n", - "\n", - "\n", - "To calculate a threshold $t$ for quantization based on a Z-score threshold $Z_t$, you might define $t$ as a function of $Z_t$, $\\mu$, and $\\sigma$, such as:\n", - "\n", - "$$\n", - "t(Z_t) = μ + Z_t \\cdot σ\n", - "$$\n", - "\n", - "\n", - "Where:\n", - "\n", - "- $t(Z_t)$: The quantization threshold calculated based on a Z-score threshold $Z_t$.\n", - "- $Z_t$: The chosen Z-score threshold value, which determines how many standard deviations from the mean an activation needs to be to be considered for special handling (e.g., removal or adjustment) before the main quantization process.\n", - "- $\\mu = \\frac{1}{n_s} \\sum_{X \\in Fl(D)} X$: The mean of activations\n", - "- $\\sigma = \\sqrt{\\frac{1}{n_s} \\sum_{X \\in Fl(D)} (X - \\mu)^2}$: The standard deviation of activations in $Fl(D)$.\n", - "where:\n", - "- $Fl(D)$ is the activation distribution and $X$ is an individual activation.\n", - "\n", - "\n", - "This equation for $t(Z_t)$ allows you to set a threshold based on the statistical distribution of activations, identifying values that are unusually high or low relative to the rest of the data. These identified values can then be removed before applying the main quantization algorithm." - ] - }, - { - "cell_type": "markdown", - "id": "04228b7c-00f1-4ded-bead-722e2a4e89a0", - "metadata": { - "id": "04228b7c-00f1-4ded-bead-722e2a4e89a0" - }, - "source": [ - "## Setup" - ] - }, - { - "cell_type": "markdown", - "id": "2657cf1a-654d-45a6-b877-8bf42fc26d0d", - "metadata": { - "id": "2657cf1a-654d-45a6-b877-8bf42fc26d0d" - }, - "source": [ - "Install and import the relevant packages:\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324", - "metadata": { - "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324" - }, - "outputs": [], - "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "!pip install -q mct-nightly" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19", - "metadata": { - "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19" - }, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "import keras\n", - "import model_compression_toolkit as mct\n", - "import os" - ] - }, - { - "cell_type": "markdown", - "id": "z8F-avk3azgZ", - "metadata": { - "id": "z8F-avk3azgZ" - }, - "source": [ - "Clone MCT to gain access to tutorial scripts" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e3b675cf-e1b5-4249-a581-ffb9b1c16ba1", - "metadata": { - "id": "e3b675cf-e1b5-4249-a581-ffb9b1c16ba1" - }, - "outputs": [], - "source": [ - "!git clone https://github.com/sony/model_optimization.git local_mct\n", - "!pip install -r ./local_mct/requirements.txt\n", - "import sys\n", - "sys.path.insert(0,\"./local_mct\")\n", - "import tutorials.resources.utils.keras_tutorial_tools as tutorial_tools\n" - ] - }, - { - "cell_type": "markdown", - "id": "0c7fed0d-cfc8-41ee-adf1-22a98110397b", - "metadata": { - "id": "0c7fed0d-cfc8-41ee-adf1-22a98110397b" - }, - "source": [ - "## Dataset" - ] - }, - { - "cell_type": "markdown", - "id": "aecde59e4c37b1da", - "metadata": { - "collapsed": false, - "id": "aecde59e4c37b1da" - }, - "source": [ - "Load ImageNet classification dataset and seperate a small representative subsection of this dataset to use for quantization." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "_ztv72uM6-UT", - "metadata": { - "id": "_ztv72uM6-UT" - }, - "outputs": [], - "source": [ - "if not os.path.isdir('imagenet'):\n", - " !mkdir imagenet\n", - " !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", - " !mv ILSVRC2012_devkit_t12.tar.gz imagenet/\n", - " !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", - " !mv ILSVRC2012_img_val.tar imagenet/" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "YVAoUjK47Zcp", - "metadata": { - "id": "YVAoUjK47Zcp" - }, - "outputs": [], - "source": [ - "import torchvision\n", - "if not os.path.isdir('imagenet/val'):\n", - " ds = torchvision.datasets.ImageNet(root='./imagenet', split='val')" - ] - }, - { - "cell_type": "markdown", - "id": "fcbb3eecae5346a9", - "metadata": { - "collapsed": false, - "id": "fcbb3eecae5346a9" - }, - "source": [ - "Here we create the representative dataset. For detail on this step see [ImageNet tutorial](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb). If you are running locally a higher fraction of the dataset can be used." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "eda9ad33-f88c-4178-8f19-bac6b2b2e97b", - "metadata": { - "id": "eda9ad33-f88c-4178-8f19-bac6b2b2e97b" - }, - "outputs": [], - "source": [ - "REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'\n", - "BATCH_SIZE = 20\n", - "fraction =0.001\n", - "model_version = 'MobileNet'\n", - "\n", - "preprocessor = tutorial_tools.DatasetPreprocessor(model_version=model_version)\n", - "representative_dataset_gen = preprocessor.get_representative_dataset(fraction, REPRESENTATIVE_DATASET_FOLDER, BATCH_SIZE)" - ] - }, - { - "cell_type": "markdown", - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", - "metadata": { - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5" - }, - "source": [ - "## MCT Quantization" - ] - }, - { - "cell_type": "markdown", - "id": "55edbb99-ab2f-4dde-aa74-4ddee61b2615", - "metadata": { - "id": "55edbb99-ab2f-4dde-aa74-4ddee61b2615" - }, - "source": [ - "This step we load the model and quantize with a few z-score thresholds.\n" - ] - }, - { - "cell_type": "markdown", - "id": "VMrcPUN6jPlB", - "metadata": { - "id": "VMrcPUN6jPlB" - }, - "source": [ - "First we load MobileNet from the keras library." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c431848f-a5f4-4737-a5c8-f046a8bca840", - "metadata": { - "id": "c431848f-a5f4-4737-a5c8-f046a8bca840" - }, - "outputs": [], - "source": [ - "from tensorflow.keras.applications import MobileNet\n", - "float_model = MobileNet(weights='imagenet')" - ] - }, - { - "cell_type": "markdown", - "id": "Pd8blHyKjWay", - "metadata": { - "id": "Pd8blHyKjWay" - }, - "source": [ - "Quantization perameters are defined. Here we will use default values apart from quantization method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ca971297-e00b-44b5-b9e1-e57ba5843e38", - "metadata": { - "id": "ca971297-e00b-44b5-b9e1-e57ba5843e38" - }, - "outputs": [], - "source": [ - "from model_compression_toolkit.core import QuantizationErrorMethod\n", - "\n", - "# Specify the IMX500-v1 target platform capability (TPC)\n", - "tpc = mct.get_target_platform_capabilities(\"tensorflow\", 'imx500', target_platform_version='v1')\n", - "\n", - "# List of error methods to iterate over\n", - "q_configs_dict = {}" - ] - }, - { - "cell_type": "markdown", - "id": "Vot-MCiWjzCE", - "metadata": { - "id": "Vot-MCiWjzCE" - }, - "source": [ - "You can edit the code below to quantize with other values of z-score." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "jtiZzXmTjxuI", - "metadata": { - "id": "jtiZzXmTjxuI" - }, - "outputs": [], - "source": [ - "# Z-score values to iterate over\n", - "z_score_values = [3,5,9]\n", - "\n", - "# Iterate and build the QuantizationConfig objects\n", - "for z_score in z_score_values:\n", - " q_config = mct.core.QuantizationConfig(\n", - " z_threshold=z_score,\n", - " )\n", - " q_configs_dict[z_score] = q_config\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "8W3Dcn0jkJOH", - "metadata": { - "id": "8W3Dcn0jkJOH" - }, - "source": [ - "Finally we quantize the model, this can take some time. Grab a coffee!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ba0c6e55-d474-4dc3-9a43-44b736635998", - "metadata": { - "id": "ba0c6e55-d474-4dc3-9a43-44b736635998" - }, - "outputs": [], - "source": [ - "quantized_models_dict = {}\n", - "\n", - "for z_score, q_config in q_configs_dict.items():\n", - " # Create a CoreConfig object with the current quantization configuration\n", - " ptq_config = mct.core.CoreConfig(quantization_config=q_config)\n", - "\n", - " # Perform MCT post-training quantization\n", - " quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", - " in_model=float_model,\n", - " representative_data_gen=representative_dataset_gen,\n", - " core_config=ptq_config,\n", - " target_platform_capabilities=tpc\n", - " )\n", - "\n", - " # Update the dictionary to include the quantized model\n", - " quantized_models_dict[z_score] = {\n", - " \"quantization_config\": q_config,\n", - " \"quantized_model\": quantized_model,\n", - " \"quantization_info\": quantization_info\n", - " }\n" - ] - }, - { - "cell_type": "markdown", - "id": "A8UHRsh2khM4", - "metadata": { - "id": "A8UHRsh2khM4" - }, - "source": [ - "### Z-Score Threshold and Distribution Visulisation" - ] - }, - { - "cell_type": "markdown", - "id": "Y-0QLWFJkpFV", - "metadata": { - "id": "Y-0QLWFJkpFV" - }, - "source": [ - "To assist with understanding we will now plot the activation distribution of Mobilenet's first activation layer.\n", - "\n", - "This will be obtained by feeding the representative dataset through the model.\n", - "To see the distribution of the activations the model needs to be rebuilt upto and including the layer chosen for distribution visulisation.\n", - "\n", - "To see said layers z-score threshold values. we will need to calculate these manually using the equestion stated in the introduction.\n", - "\n", - "To plot the distribution we first need to list the layer names. With keras this can be done easily using the following. We established the index of the layer of interest using various checks that can be seen in the appendix section." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac", - "metadata": { - "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac" - }, - "outputs": [], - "source": [ - "#print layer name\n", - "print(float_model.layers[51].name)" - ] - }, - { - "cell_type": "markdown", - "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277", - "metadata": { - "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277" - }, - "source": [ - "The example activation layer in model is 'conv_dw_8_relu'.\n", - "\n", - "Use this layer name to create a model ending at conv_dw_8_relu" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d", - "metadata": { - "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d" - }, - "outputs": [], - "source": [ - "from tensorflow.keras.models import Model\n", - "layer_name1 = 'conv_dw_8_relu'\n", - "\n", - "layer_output1 = float_model.get_layer(layer_name1).output\n", - "activation_model_relu = Model(inputs=float_model.input, outputs=layer_output1)" - ] - }, - { - "cell_type": "markdown", - "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364", - "metadata": { - "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364" - }, - "source": [ - "Feed the representative dataset through these models and store the output." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "eaeb9888-5d67-4979-af50-80781a811b4b", - "metadata": { - "id": "eaeb9888-5d67-4979-af50-80781a811b4b" - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "activation_batches_relu = []\n", - "activation_batches_project = []\n", - "for images in representative_dataset_gen():\n", - " activations_relu = activation_model_relu.predict(images)\n", - " activation_batches_relu.append(activations_relu)\n", - "\n", - "all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()" - ] - }, - { - "cell_type": "markdown", - "id": "I5W9yY5DvOFr", - "metadata": { - "id": "I5W9yY5DvOFr" - }, - "source": [ - "We can calculate the z-score for a layer using the equations stated in the introduction." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "WDx-LQSyxpDK", - "metadata": { - "id": "WDx-LQSyxpDK" - }, - "outputs": [], - "source": [ - "optimal_thresholds_relu = {}\n", - "\n", - "# Calculate the mean and standard deviation of the activation data\n", - "mean = np.mean(all_activations_relu)\n", - "std_dev = np.std(all_activations_relu)\n", - "\n", - "# Calculate and store the threshold for each Z-score\n", - "for zscore in z_score_values:\n", - " optimal_threshold = zscore * std_dev + mean\n", - " optimal_thresholds_relu[f'z-score {zscore}'] = optimal_threshold" - ] - }, - { - "cell_type": "markdown", - "id": "XRAr8L5mvuLd", - "metadata": { - "id": "XRAr8L5mvuLd" - }, - "source": [ - "### Distribution Plots\n", - "\n", - "Here we plot the distribution from the resulting model along with its z score thresholds." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "VPb8tBNGpJjo", - "metadata": { - "id": "VPb8tBNGpJjo" - }, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "\n", - "# Plotting\n", - "plt.figure(figsize=(10, 6))\n", - "plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Activations')\n", - "for z_score, threshold in optimal_thresholds_relu.items():\n", - " random_color=np.random.rand(3,)\n", - " plt.axvline(threshold, linestyle='--', linewidth=2, color=random_color, label=f'{z_score}, z-score threshold: {threshold:.2f}')\n", - " z_score_1 = int(z_score.split(' ')[1]) # Splits the string and converts the second element to an integer\n", - " error_value = mse_error_thresholds[z_score_1] # Now using the correct integer key to access the value\n", - " plt.axvline(error_value, linestyle='-', linewidth=2, color=random_color, label=f'{z_score}, MSE error Threshold: {error_value:.2f}')\n", - "\n", - "plt.title('Activation Distribution with Optimal Quantization Thresholds - First ReLU Layer')\n", - "plt.xlabel('Activation Value')\n", - "plt.ylabel('Frequency')\n", - "plt.legend()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "id": "qbA6kFmw0vaf", - "metadata": { - "id": "qbA6kFmw0vaf" - }, - "source": [ - "Here it can plainly be seen the effect of z-score on error threshold. The lowest z-score of 3 reduces the error threshold for that layer." - ] - }, - { - "cell_type": "markdown", - "id": "4c967d41-439d-405b-815f-be641f1768fe", - "metadata": { - "id": "4c967d41-439d-405b-815f-be641f1768fe" - }, - "source": [ - "## Accuracy\n", - "\n", - "Finally we can show the effect of these different z-score thresholds on the models accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "092d9fd0-8005-4551-b853-3b52840639c2", - "metadata": { - "id": "092d9fd0-8005-4551-b853-3b52840639c2" - }, - "outputs": [], - "source": [ - "REPRESENTATIVE_DATASET_FOLDER = './imagenet/val'\n", - "BATCH_SIZE = 20\n", - "fraction =0.005\n", - "model_version = 'MobileNet'\n", - "\n", - "preprocessor = tutorial_tools.DatasetPreprocessor(model_version=model_version)\n", - "evaluation_dataset = preprocessor.get_validation_dataset_fraction(fraction, REPRESENTATIVE_DATASET_FOLDER, BATCH_SIZE)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8ebf7d04-7816-465c-9157-6068c0a4a08a", - "metadata": { - "id": "8ebf7d04-7816-465c-9157-6068c0a4a08a" - }, - "outputs": [], - "source": [ - "#prepare float model and evaluate\n", - "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[\"accuracy\"])\n", - "results = float_model.evaluate(evaluation_dataset)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613", - "metadata": { - "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613" - }, - "outputs": [], - "source": [ - "#prepare quantised models and evaluate\n", - "evaluation_results = {}\n", - "\n", - "for z_score, data in quantized_models_dict.items():\n", - " quantized_model = data[\"quantized_model\"]\n", - "\n", - " quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[\"accuracy\"])\n", - "\n", - " results = quantized_model.evaluate(evaluation_dataset, verbose=0) # Set verbose=0 to suppress the log messages\n", - "\n", - " evaluation_results[z_score] = results\n", - "\n", - " # Print the results\n", - " print(f\"Results for {z_score}: Loss = {results[0]}, Accuracy = {results[1]}\")" - ] - }, - { - "cell_type": "markdown", - "id": "GpEZ2E1qzWl3", - "metadata": { - "id": "GpEZ2E1qzWl3" - }, - "source": [ - "Here we can see very minor gains from adjusting the z-score threshold. For the majority of simple models this trend will likely follow. From testing we have found that transformer models have a tendancy to benefit from anomoly removal but it is always worth playing with these perameters if your quantised accuracy is distinctly lower than your float model accuracy.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "14877777", - "metadata": { - "id": "14877777" - }, - "source": [ - "## Conclusion" - ] - }, - { - "cell_type": "markdown", - "id": "bb7e1572", - "metadata": { - "id": "bb7e1572" - }, - "source": [ - "In this tutorial, we demonstrated the z-score thresholding step used during quantization. Please use this code to assist with choosing z-score thresholds for your own model.\n", - "\n", - "We have found a when adjusting z-score the sweet spot tends to be between 8 and 12. with no change above 12 and distribution distruction below 8. This will likely require a study on your part for your specific usecase.\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "BVHmePYJe7he", - "metadata": { - "id": "BVHmePYJe7he" - }, - "source": [ - "## Appendix\n", - "\n", - "Below are a sellection of code samples used to establish the best layers to use for plotting thresholds and distributions.\n", - "\n", - "Firstly of the list of layers that are effected by this z-score adjustment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "cn-Ac9br9Ltz", - "metadata": { - "id": "cn-Ac9br9Ltz" - }, - "outputs": [], - "source": [ - "# Initialize a dictionary to hold threshold values for comparison\n", - "thresholds_by_index = {}\n", - "\n", - "# Try to access each layer for each quantized model and collect threshold values\n", - "for z_score, data in quantized_models_dict.items():\n", - " quantized_model = data[\"quantized_model\"]\n", - " for layer_index in range(len(quantized_model.layers)):\n", - " try:\n", - " # Attempt to access the threshold value for this layer\n", - " threshold = quantized_model.layers[layer_index].activation_holder_quantizer.get_config()['threshold'][0]\n", - " # Store the threshold value for comparison\n", - " if layer_index not in thresholds_by_index:\n", - " thresholds_by_index[layer_index] = set()\n", - " thresholds_by_index[layer_index].add(threshold)\n", - " except Exception as e:\n", - " pass\n", - "\n", - "# Find indices where threshold values are not consistent\n", - "inconsistent_indices = [index for index, thresholds in thresholds_by_index.items() if len(thresholds) > 1]\n", - "\n", - "print(\"Inconsistent indices:\", inconsistent_indices)\n" - ] - }, - { - "cell_type": "markdown", - "id": "PiNdvojz_FDN", - "metadata": { - "id": "PiNdvojz_FDN" - }, - "source": [ - "Choosing randomly from these we check the thresholds" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "Huv0u6z106lX", - "metadata": { - "id": "Huv0u6z106lX" - }, - "outputs": [], - "source": [ - "mse_error_thresholds = {\n", - " z_score: data[\"quantized_model\"].layers[52].activation_holder_quantizer.get_config()['threshold'][0]\n", - " for z_score, data in quantized_models_dict.items()\n", - "}\n", - "print(mse_error_thresholds)" - ] - }, - { - "cell_type": "markdown", - "id": "0YPqhQOh_N2r", - "metadata": { - "id": "0YPqhQOh_N2r" - }, - "source": [ - "We now want to varify which layers matchup indicies based on layer names of the float model. For the example of 52 there is no matching layer as it is a quantization of the previous layer. Checking 51 we can see that the indicies matches upto the layer name conv_dw_8_relu, we can use this to plot the distribution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "rWGx5-6uu5H-", - "metadata": { - "id": "rWGx5-6uu5H-" - }, - "outputs": [], - "source": [ - "target_z_score = 9\n", - "\n", - "for index, layer in enumerate(float_model.layers):\n", - " search_string = str(layer.name)\n", - "\n", - " # Check if the target_z_score is in the quantized_models_dict\n", - " if target_z_score in quantized_models_dict:\n", - " data = quantized_models_dict[target_z_score]\n", - " # Iterate over each layer of the target quantized model\n", - " for quantized_index, quantized_layer in enumerate(data[\"quantized_model\"].layers):\n", - " found = search_string in str(quantized_layer.get_config())\n", - " # If found, print details including the indices of the matching layers\n", - " if found:\n", - " print(f\"Float Model Layer Index {index} & Quantized Model Layer Index {quantized_index}: Found match in layer name {search_string}\")\n", - " else:\n", - " print(f\"Z-Score {target_z_score} not found in quantized_models_dict.\")\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "AW_vC22Qw32E", - "metadata": { - "id": "AW_vC22Qw32E" - }, - "outputs": [], - "source": [ - "data[\"quantized_model\"].layers[51].get_config()" - ] - }, - { - "cell_type": "markdown", - "id": "01c1645e-205c-4d9a-8af3-e497b3addec1", - "metadata": { - "id": "01c1645e-205c-4d9a-8af3-e497b3addec1" - }, - "source": [ - "\n", - "\n", - "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", - "\n", - "Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "you may not use this file except in compliance with the License.\n", - "You may obtain a copy of the License at\n", - "\n", - " http://www.apache.org/licenses/LICENSE-2.0\n", - "\n", - "Unless required by applicable law or agreed to in writing, software\n", - "distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "See the License for the specific language governing permissions and\n", - "limitations under the License.\n" - ] - } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 + "cells": [ + { + "cell_type": "markdown", + "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20", + "metadata": { + "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20" + }, + "source": [ + "# Enhancing Post-Training Quantization with Z-Score Outlier Handling\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb)\n", + "\n", + "## Overview\n", + "This tutorial demonstrates the process used to find the activation z-score threshold, a step that MCT can use during post-training quantization.\n", + "\n", + "In this example we will explore how setting different z scores effects threshold and accuracy. We will start by demonstrating how to apply the corresponding MCT configurations, then, we will feed a representative dataset through the model, plot the activation distribution of an activation layer with their respective MCT calculated z-score thresholds, and finally compare the quantized model accuracy of the examples of different z-score.\n", + "\n", + "## Managing Outliers with Activation Z-Score Thresholding\n", + "During the quantization process, thresholds are used to map a distribution of 32-bit floating-point values to their quantized equivalents. Achieving this with minimal data loss while preserving the most representative range is crucial for maintaining the model’s final accuracy.\n", + "\n", + "Some models can exhibit anomalous values when evaluated on a representative dataset. These outliers can negatively impact the range selection, leading to suboptimal quantization. To ensure a more reliable range mapping, it is beneficial to remove these values.\n", + "\n", + "The **Model Compression Toolkit (MCT)** provides an option to filter out such outliers using **Z-score thresholding**, allowing users to exclude values based on their deviation from the standard distribution.\n", + "\n", + "The Z-score of a value is calculated by subtracting the dataset’s mean from the value and then dividing by the standard deviation. This metric indicates how many standard deviations a particular value is away from the mean.\n", + "\n", + "\n", + "\n", + "The quantization threshold, $t$, is defined as a function of $Z_t$, the mean, $μ$, and the standard deviation, $σ$, of the activation values:\n", + "\n", + "$$\n", + "t(Z_t) = μ + Z_t \\cdot σ\n", + "$$\n", + "\n", + "\n", + "Where:\n", + "\n", + "- $t(Z_t)$: The calculated quantization threshold based on the Z-score threshold $Z_t$.\n", + "- $Z_t$: The chosen Z-score threshold. It indicates how many standard deviations an activation value must be from the mean to qualify for removal or special handling prior to quantization.\n", + "- $\\mu = \\frac{1}{n_s} \\sum_{X \\in F_l(D)} X$: The mean of activations\n", + "- $\\sigma = \\sqrt{\\frac{1}{n_s} \\sum_{X \\in F_l(D)} (X - \\mu)^2}$: The standard deviation of activations in $F_l(D)$.\n", + " where:\n", + " - $F_l(D)$: Represents the distribution of activation values.\n", + " - $X$: An individual activation within the distribution.\n", + "\n", + "\n", + "This equation for $t(Z_t)$ enables the identification of activation values that deviate significantly from the mean, helping to remove outliers before the main quantization step. This process results in a more reliable range for mapping floating-point values to quantized representations, ultimately improving quantization accuracy.\n", + "## Setup\n", + "Install the relevant packages:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324", + "metadata": { + "id": "324685b9-5dcc-4d22-80f4-dec9a93d3324" + }, + "outputs": [], + "source": [ + "TF_VER = '2.14'\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + }, + "id": "bd8e08612add2018" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19", + "metadata": { + "id": "b3f0acc8-281c-4bca-b0b9-3d7677105f19" + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import keras" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." + ], + "metadata": { + "collapsed": false + }, + "id": "fd5ac404451fd924" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "\n", + "float_model = MobileNetV2()" + ], + "metadata": { + "collapsed": false + }, + "id": "69daa2d5d731b157" + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset preparation\n", + "### Download the ImageNet validation set\n", + "Download the ImageNet dataset with only the validation split.\n", + "**Note:** For demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", + "\n", + "This step may take several minutes..." + ], + "metadata": { + "collapsed": false + }, + "id": "a5584696d8f09653" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import os\n", + " \n", + "if not os.path.isdir('imagenet'):\n", + " !mkdir imagenet\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", + " \n", + " !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \\\n", + " mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val" + ], + "metadata": { + "collapsed": false + }, + "id": "66a1e4f3878aa76b" + }, + { + "cell_type": "markdown", + "source": [ + "The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders." + ], + "metadata": { + "collapsed": false + }, + "id": "c343537ba9ba1e6d" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import shutil\n", + "\n", + "root = Path('./imagenet')\n", + "imgs_dir = root / 'ILSVRC2012_img_val'\n", + "target_dir = root /'val'\n", + "\n", + "def extract_labels():\n", + " !pip install -q scipy\n", + " import scipy\n", + " mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)\n", + " cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} \n", + " with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:\n", + " return [cls_to_nid[int(cls)] for cls in f.readlines()]\n", + "\n", + "if not target_dir.exists():\n", + " labels = extract_labels()\n", + " for lbl in set(labels):\n", + " os.makedirs(target_dir / lbl)\n", + " \n", + " for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):\n", + " shutil.move(imgs_dir / img_file, target_dir / lbl)\n" + ], + "metadata": { + "collapsed": false + }, + "id": "bddd52741649281e" + }, + { + "cell_type": "markdown", + "source": [ + "These functions generate a `tf.data.Dataset` from image files in a directory." + ], + "metadata": { + "collapsed": false + }, + "id": "53bff06ed1608b1b" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def imagenet_preprocess_input(images, labels):\n", + " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels\n", + "\n", + "def get_dataset(batch_size, shuffle):\n", + " dataset = tf.keras.utils.image_dataset_from_directory(\n", + " directory='./imagenet/val',\n", + " batch_size=batch_size,\n", + " image_size=[224, 224],\n", + " shuffle=shuffle,\n", + " crop_to_aspect_ratio=True,\n", + " interpolation='bilinear')\n", + " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)\n", + " dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)\n", + " return dataset" + ], + "metadata": { + "collapsed": false + }, + "id": "73ad65d39184ac57" + }, + { + "cell_type": "markdown", + "source": [ + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:" + ], + "metadata": { + "collapsed": false + }, + "id": "fb36537e4308b48e" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "batch_size = 32\n", + "n_iter = 10\n", + "\n", + "dataset = get_dataset(batch_size, shuffle=True)\n", + "\n", + "def representative_dataset_gen():\n", + " for _ in range(n_iter):\n", + " yield [dataset.take(1).get_single_element()[0].numpy()]" + ], + "metadata": { + "collapsed": false + }, + "id": "49f40f3ea3fc8855" + }, + { + "cell_type": "markdown", + "source": [ + "## Target Platform Capabilities\n", + "MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Tensorflow TPC:" + ], + "metadata": { + "collapsed": false + }, + "id": "e7197e9b332c3bde" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import model_compression_toolkit as mct\n", + "\n", + "# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Keras layers representation.\n", + "target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default')" + ], + "metadata": { + "collapsed": false + }, + "id": "d1d179f65e3fc09f" + }, + { + "cell_type": "markdown", + "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", + "metadata": { + "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5" + }, + "source": [ + "## Post-Training Quantization using MCT\n", + "This step we quantize the model with a few Z-score thresholds.\n", + "The quantization parameters are predefined, and we use the default values except for the quantization method. Feel free to modify the code below to experiment with other Z-scores values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "jtiZzXmTjxuI", + "metadata": { + "id": "jtiZzXmTjxuI" + }, + "outputs": [], + "source": [ + "# List of error methods to iterate over\n", + "q_configs_dict = {}\n", + "\n", + "# Z-score values to iterate over\n", + "z_score_values = [3,5,9]\n", + "\n", + "# Iterate and build the QuantizationConfig objects\n", + "for z_score in z_score_values:\n", + " q_config = mct.core.QuantizationConfig(\n", + " z_threshold=z_score,\n", + " )\n", + " q_configs_dict[z_score] = q_config" + ] + }, + { + "cell_type": "markdown", + "id": "8W3Dcn0jkJOH", + "metadata": { + "id": "8W3Dcn0jkJOH" + }, + "source": [ + "Now we will run post-training quantization for each configuration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba0c6e55-d474-4dc3-9a43-44b736635998", + "metadata": { + "id": "ba0c6e55-d474-4dc3-9a43-44b736635998" + }, + "outputs": [], + "source": [ + "quantized_models_dict = {}\n", + "\n", + "for z_score, q_config in q_configs_dict.items():\n", + " # Create a CoreConfig object with the current quantization configuration\n", + " ptq_config = mct.core.CoreConfig(quantization_config=q_config)\n", + "\n", + " # Perform MCT post-training quantization\n", + " quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", + " in_model=float_model,\n", + " representative_data_gen=representative_dataset_gen,\n", + " core_config=ptq_config,\n", + " target_platform_capabilities=target_platform_cap\n", + " )\n", + "\n", + " # Update the dictionary to include the quantized model\n", + " quantized_models_dict[z_score] = {\n", + " \"quantization_config\": q_config,\n", + " \"quantized_model\": quantized_model,\n", + " \"quantization_info\": quantization_info\n", + " }\n" + ] + }, + { + "cell_type": "markdown", + "id": "A8UHRsh2khM4", + "metadata": { + "id": "A8UHRsh2khM4" + }, + "source": [ + "### Z-Score Threshold and Distribution Visualization\n", + "To aid in understanding, we will plot the activation distribution of an activation layer in MobileNetV2. This distribution will be generated by inferring a representative dataset through the model.\n", + "\n", + "To visualize the activations, the model must be rebuilt up to and including the selected layer. Once the activations are extracted, we can calculate their Z-score threshold values manually using the equation provided in the introduction.\n", + "\n", + "Before plotting the distribution, we need to list the layer names. With Keras, this can be done easily using the following code. We determined the index of the layer of interest through a series of checks, which are detailed in the appendix section." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac", + "metadata": { + "id": "a22e6d68-c40f-40bf-ab74-ff453011aeac" + }, + "outputs": [], + "source": [ + "#print layer name\n", + "layer_name = float_model.layers[51].name\n", + "print(layer_name)" + ] + }, + { + "cell_type": "markdown", + "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277", + "metadata": { + "id": "c38d28f3-c947-4c7c-aafa-e96cc3864277" + }, + "source": [ + "The example activation layer in the model is named `conv_dw_8_relu`.\n", + "\n", + "We will use this layer name to build a model that ends at `conv_dw_8_relu`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d", + "metadata": { + "id": "1f9dd3f3-6e22-4be9-9beb-29568ff14c9d" + }, + "outputs": [], + "source": [ + "from tensorflow.keras.models import Model\n", + "\n", + "layer_output = float_model.get_layer(layer_name).output\n", + "activation_model_relu = Model(inputs=float_model.input, outputs=layer_output)" + ] + }, + { + "cell_type": "markdown", + "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364", + "metadata": { + "id": "ccc81508-01e5-421c-9b48-6ed3ce5b7364" + }, + "source": [ + "Infer the representative dataset using these models and store the outputs for further analysis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eaeb9888-5d67-4979-af50-80781a811b4b", + "metadata": { + "id": "eaeb9888-5d67-4979-af50-80781a811b4b" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "activation_batches_relu = []\n", + "activation_batches_project = []\n", + "for images in representative_dataset_gen():\n", + " activations_relu = activation_model_relu.predict(images)\n", + " activation_batches_relu.append(activations_relu)\n", + "\n", + "all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()" + ] + }, + { + "cell_type": "markdown", + "id": "I5W9yY5DvOFr", + "metadata": { + "id": "I5W9yY5DvOFr" + }, + "source": [ + "We can compute the Z-score for a layer using the formulas provided in the introduction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "WDx-LQSyxpDK", + "metadata": { + "id": "WDx-LQSyxpDK" + }, + "outputs": [], + "source": [ + "optimal_thresholds_relu = {}\n", + "\n", + "# Calculate the mean and standard deviation of the activation data\n", + "mean = np.mean(all_activations_relu)\n", + "std_dev = np.std(all_activations_relu)\n", + "\n", + "# Calculate and store the threshold for each Z-score\n", + "for zscore in z_score_values:\n", + " optimal_threshold = zscore * std_dev + mean\n", + " optimal_thresholds_relu[f'z-score {zscore}'] = optimal_threshold" + ] + }, + { + "cell_type": "markdown", + "id": "XRAr8L5mvuLd", + "metadata": { + "id": "XRAr8L5mvuLd" + }, + "source": [ + "### Distribution Plots\n", + "In this section, we visualize the activation distribution from the constructed model along with the corresponding Z-score thresholds.\n", + "From this list, we randomly select layers and evaluate their corresponding thresholds." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "mse_error_thresholds = {\n", + " z_score: data[\"quantized_model\"].layers[53].activation_holder_quantizer.get_config()['threshold'][0]\n", + " for z_score, data in quantized_models_dict.items()\n", + "}\n", + "print(mse_error_thresholds)" + ], + "metadata": { + "collapsed": false + }, + "id": "dd8a1bef743d9711" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "VPb8tBNGpJjo", + "metadata": { + "id": "VPb8tBNGpJjo" + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "# Plotting\n", + "plt.figure(figsize=(10, 6))\n", + "plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Activations')\n", + "for z_score, threshold in optimal_thresholds_relu.items():\n", + " random_color=np.random.rand(3,)\n", + " plt.axvline(threshold, linestyle='--', linewidth=2, color=random_color, label=f'{z_score}, z-score threshold: {threshold:.2f}')\n", + " z_score_1 = int(z_score.split(' ')[1]) # Splits the string and converts the second element to an integer\n", + " error_value = mse_error_thresholds[z_score_1] # Now using the correct integer key to access the value\n", + " plt.axvline(error_value, linestyle='-', linewidth=2, color=random_color, label=f'{z_score}, MSE error Threshold: {error_value:.2f}')\n", + "\n", + "plt.title('Activation Distribution with Optimal Quantization Thresholds - First ReLU Layer')\n", + "plt.xlabel('Activation Value')\n", + "plt.ylabel('Frequency')\n", + "plt.legend()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "qbA6kFmw0vaf", + "metadata": { + "id": "qbA6kFmw0vaf" + }, + "source": [ + "The impact of the Z-score on the error threshold is clearly visible here. A lower Z-score, such as 3, decreases the error threshold for the given layer." + ] + }, + { + "cell_type": "markdown", + "id": "4c967d41-439d-405b-815f-be641f1768fe", + "metadata": { + "id": "4c967d41-439d-405b-815f-be641f1768fe" + }, + "source": [ + "## Model Evaluation\n", + "Finally, we can demonstrate how varying Z-score thresholds affect the model's accuracy.\n", + "In order to evaluate our models, we first need to load the validation dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "val_dataset = get_dataset(batch_size=50, shuffle=False)" + ], + "metadata": { + "collapsed": false + }, + "id": "edb94bd69d88e1a2" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8ebf7d04-7816-465c-9157-6068c0a4a08a", + "metadata": { + "id": "8ebf7d04-7816-465c-9157-6068c0a4a08a" + }, + "outputs": [], + "source": [ + "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", + "float_accuracy = float_model.evaluate(val_dataset)\n", + "print(f\"Float model's Top 1 accuracy on the Imagenet validation set: {(float_accuracy[1] * 100):.2f}%\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613", + "metadata": { + "id": "07a22d28-56ff-46de-8ed0-1163c3b7a613" + }, + "outputs": [], + "source": [ + "#prepare quantised models and evaluate\n", + "evaluation_results = {}\n", + "\n", + "for z_score, data in quantized_models_dict.items():\n", + " quantized_model = data[\"quantized_model\"]\n", + "\n", + " quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=[\"accuracy\"])\n", + "\n", + " results = quantized_model.evaluate(val_dataset, verbose=0) # Set verbose=0 to suppress the log messages\n", + "\n", + " evaluation_results[z_score] = results\n", + "\n", + " # Print the results\n", + " print(f\"Results for {z_score}: Loss = {results[0]}, Accuracy = {results[1]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "GpEZ2E1qzWl3", + "metadata": { + "id": "GpEZ2E1qzWl3" + }, + "source": [ + "We observe only minor improvements when adjusting the Z-score threshold. This pattern is common for most simple models. However, our testing shows that transformer models tend to benefit more from outlier removal. It is advisable to experiment with these parameters if the quantized accuracy is noticeably lower than the float model’s accuracy." + ] + }, + { + "cell_type": "markdown", + "id": "14877777", + "metadata": { + "id": "14877777" + }, + "source": [ + "## Conclusion\n", + "In this tutorial, we demonstrated the use of Z-score thresholding as a critical step in the quantization process. This technique helps refine activation ranges by removing outliers, ultimately leading to improved quantized model accuracy. You can use the provided code as a starting point to experiment with selecting optimal Z-score thresholds for your own models.\n", + "\n", + "Our testing indicates that the optimal Z-score threshold typically falls between 8 and 12. Setting the threshold above 12 tends to show negligible improvement, while values below 8 may distort the distribution. However, finding the right threshold will require experimentation based on the specific characteristics of your model and use case.\n", + "\n", + "By applying Z-score thresholding thoughtfully, you can mitigate quantization errors and ensure that the quantized model's performance remains as close as possible to that of the original floating-point version." + ] + }, + { + "cell_type": "markdown", + "id": "BVHmePYJe7he", + "metadata": { + "id": "BVHmePYJe7he" + }, + "source": [ + "## Appendix\n", + "Below are selected code samples used to identify the most suitable layers for plotting thresholds and distributions.\n", + "\n", + "**Listing Layers Affected by Z-Score Adjustments**\n", + "The following code snippet provides a list of layers that are impacted by Z-score thresholding, helping to determine which layers to focus on when visualizing distributions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cn-Ac9br9Ltz", + "metadata": { + "id": "cn-Ac9br9Ltz" + }, + "outputs": [], + "source": [ + "# Initialize a dictionary to hold threshold values for comparison\n", + "thresholds_by_index = {}\n", + "\n", + "# Try to access each layer for each quantized model and collect threshold values\n", + "for z_score, data in quantized_models_dict.items():\n", + " quantized_model = data[\"quantized_model\"]\n", + " for layer_index in range(len(quantized_model.layers)):\n", + " try:\n", + " # Attempt to access the threshold value for this layer\n", + " threshold = quantized_model.layers[layer_index].activation_holder_quantizer.get_config()['threshold'][0]\n", + " # Store the threshold value for comparison\n", + " if layer_index not in thresholds_by_index:\n", + " thresholds_by_index[layer_index] = set()\n", + " thresholds_by_index[layer_index].add(threshold)\n", + " except Exception as e:\n", + " pass\n", + "\n", + "# Find indices where threshold values are not consistent\n", + "inconsistent_indices = [index for index, thresholds in thresholds_by_index.items() if len(thresholds) > 1]\n", + "\n", + "print(\"Inconsistent indices:\", inconsistent_indices)\n" + ] + }, + { + "cell_type": "markdown", + "id": "0YPqhQOh_N2r", + "metadata": { + "id": "0YPqhQOh_N2r" + }, + "source": [ + "\n", + "Next, we want to verify which layers correspond to the indices based on the layer names in the original float model. For example, index 52 has no matching layer, as it represents a quantized version of the previous layer. However, checking index 51 reveals that it aligns with the layer named `conv_dw_8_relu`, which we can use to plot the distribution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "rWGx5-6uu5H-", + "metadata": { + "id": "rWGx5-6uu5H-" + }, + "outputs": [], + "source": [ + "target_z_score = 9\n", + "\n", + "for index, layer in enumerate(float_model.layers):\n", + " search_string = str(layer.name)\n", + "\n", + " # Check if the target_z_score is in the quantized_models_dict\n", + " if target_z_score in quantized_models_dict:\n", + " data = quantized_models_dict[target_z_score]\n", + " # Iterate over each layer of the target quantized model\n", + " for quantized_index, quantized_layer in enumerate(data[\"quantized_model\"].layers):\n", + " found = search_string in str(quantized_layer.get_config())\n", + " # If found, print details including the indices of the matching layers\n", + " if found:\n", + " print(f\"Float Model Layer Index {index} & Quantized Model Layer Index {quantized_index}: Found match in layer name {search_string}\")\n", + " else:\n", + " print(f\"Z-Score {target_z_score} not found in quantized_models_dict.\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "AW_vC22Qw32E", + "metadata": { + "id": "AW_vC22Qw32E" + }, + "outputs": [], + "source": [ + "data[\"quantized_model\"].layers[51].get_config()" + ] + }, + { + "cell_type": "markdown", + "id": "01c1645e-205c-4d9a-8af3-e497b3addec1", + "metadata": { + "id": "01c1645e-205c-4d9a-8af3-e497b3addec1" + }, + "source": [ + "\n", + "\n", + "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License.\n" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 } diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb index 3ec13eea0..0ea59fa7d 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb @@ -1,292 +1,316 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Export Quantized Keras Model\n", - "\n", - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb)\n", - "\n", - "\n", - "To export a TensorFlow model as a quantized model, it is necessary to first apply quantization\n", - "to the model using MCT:\n", - "\n", - "\n", - "\n" - ], - "metadata": { - "id": "UJDzewEYfSN5" - } - }, - { - "cell_type": "code", - "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "! pip install -q mct-nightly" - ], - "metadata": { - "id": "qNddNV6TEsX0" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "import numpy as np\n", - "from keras.applications import MobileNetV2\n", - "import model_compression_toolkit as mct\n", - "\n", - "# Create a model\n", - "float_model = MobileNetV2()\n", - "# Quantize the model.\n", - "# Notice that here the representative dataset is random for demonstration only.\n", - "quantized_exportable_model, _ = mct.ptq.keras_post_training_quantization(float_model,\n", - " representative_data_gen=lambda: [np.random.random((1, 224, 224, 3))])" - ], - "metadata": { - "id": "eheBYKxRDFgx" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "\n", - "\n", - "### keras\n", - "\n", - "The model will be exported as a tensorflow `.keras` model where weights and activations are quantized but represented using a float32 dtype.\n", - "Two optional quantization formats are available: MCTQ and FAKELY_QUANT.\n", - "\n", - "#### MCTQ\n", - "\n", - "By default, `mct.exporter.keras_export_model` will export the quantized Keras model to\n", - "a .keras model with custom quantizers from mct_quantizers module.\n", - "\n", - "\n" - ], - "metadata": { - "id": "-n70LVe6DQPw" - } - }, - { - "cell_type": "code", - "source": [ - "# Path of exported model\n", - "keras_file_path = 'exported_model_mctq.keras'\n", - "\n", - "# Export a keras model with mctq custom quantizers.\n", - "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", - " save_model_path=keras_file_path)" - ], - "metadata": { - "id": "PO-Hh0bzD1VJ" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Notice that the model has the same size as the quantized exportable model as weights data types are float.\n", - "\n", - "#### MCTQ - Loading Exported Model\n", - "\n", - "To load the exported model with MCTQ quantizers, use `mct.keras_load_quantized_model`:" - ], - "metadata": { - "id": "Bwx5rxXDF_gb" - } - }, - { - "cell_type": "code", - "source": [ - "loaded_model = mct.keras_load_quantized_model(keras_file_path)" - ], - "metadata": { - "id": "q235XNJQmTdd" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "\n", - "#### Fakely-Quantized" - ], - "metadata": { - "id": "sOmDjSehlQba" - } - }, - { - "cell_type": "code", - "source": [ - "# Path of exported model\n", - "keras_file_path = 'exported_model_fakequant.keras'\n", - "\n", - "# Use mode KerasExportSerializationFormat.KERAS for a .keras model\n", - "# and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights\n", - "# and activations.\n", - "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", - " save_model_path=keras_file_path,\n", - " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)" - ], - "metadata": { - "id": "WLyHEEiwGByT" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Notice that the fakely-quantized model has the same size as the quantized exportable model as weights data types are\n", - "float.\n", - "\n", - "\n", - "\n", - "### TFLite\n", - "The tflite serialization format export in two qauntization formats: INT8 and FAKELY_QUANT.\n", - "\n", - "#### INT8 TFLite\n", - "\n", - "The model will be exported as a tflite model where weights and activations are represented as 8bit integers." - ], - "metadata": { - "id": "-L1aRxFGGFeF" - } - }, - { - "cell_type": "code", - "source": [ - "tflite_file_path = 'exported_model_int8.tflite'\n", - "\n", - "# Use mode KerasExportSerializationFormat.TFLITE for tflite model and quantization_format.INT8.\n", - "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", - " save_model_path=tflite_file_path,\n", - " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", - " quantization_format=mct.exporter.QuantizationFormat.INT8)" - ], - "metadata": { - "id": "V4I-p1q5GLzs" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Compare size of float and quantized model:\n" - ], - "metadata": { - "id": "SBqtJV9AGRzN" - } - }, - { - "cell_type": "code", - "source": [ - "import os\n", - "\n", - "# Save float model to measure its size\n", - "float_file_path = 'exported_model_float.keras'\n", - "float_model.save(float_file_path)\n", - "\n", - "print(\"Float model in Mb:\", os.path.getsize(float_file_path) / float(2 ** 20))\n", - "print(\"Quantized model in Mb:\", os.path.getsize(tflite_file_path) / float(2 ** 20))" - ], - "metadata": { - "id": "LInM16OMGUtF" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "\n", - "#### Fakely-Quantized TFLite\n", - "\n", - "The model will be exported as a tflite model where weights and activations are quantized but represented with a float data type.\n", - "\n", - "##### Usage Example" - ], - "metadata": { - "id": "9eVDoIHiGX5-" - } - }, - { - "cell_type": "code", - "source": [ - "# Path of exported model\n", - "tflite_file_path = 'exported_model_fakequant.tflite'\n", - "\n", - "\n", - "# Use mode KerasExportSerializationFormat.TFLITE for tflite model and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights\n", - "# and activations.\n", - "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", - " save_model_path=tflite_file_path,\n", - " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", - " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)" - ], - "metadata": { - "id": "0OYLAbI8Gawu" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "\n", - "\n", - "\n", - "\n", - "Notice that the fakely-quantized model has the same size as the quantized exportable model as weights data types are\n", - "float.\n" - ], - "metadata": { - "id": "voOrtCroD-HE" - } - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bb7e1572" - }, - "source": [ - "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", - "\n", - "Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "you may not use this file except in compliance with the License.\n", - "You may obtain a copy of the License at\n", - "\n", - " http://www.apache.org/licenses/LICENSE-2.0\n", - "\n", - "Unless required by applicable law or agreed to in writing, software\n", - "distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "See the License for the specific language governing permissions and\n", - "limitations under the License.\n" - ] - } - ] + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Export a Quantized Keras Model With the Model Compression Toolkit (MCT)\n", + "\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb)\n", + "\n", + "## Overview\n", + "This tutorial demonstrates how to export a Keras model to `.keras` and TFLite formats using the Model Compression Toolkit (MCT). It covers the steps of creating a simple Keras model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to `.keras` and TFLite. The tutorial also shows how to use the exported model for inference.\n", + "\n", + "## Summary\n", + "In this tutorial, we will cover:\n", + "\n", + "1. Constructing a simple Keras model for demonstration purposes.\n", + "2. Applying post-training quantization to the model using the Model Compression Toolkit.\n", + "3. Exporting the quantized model to the `.keras` and `TFLite` formats.\n", + "4. Using the exported model for inference.\n", + "\n", + "## Setup\n", + "Install the relevant packages:" + ], + "metadata": { + "id": "UJDzewEYfSN5" + } + }, + { + "cell_type": "code", + "source": [ + "TF_VER = '2.14.0'\n", + "\n", + "!pip install -q tensorflow=={TF_VER}" + ], + "metadata": { + "id": "qNddNV6TEsX0" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "\n", + "float_model = MobileNetV2()" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Quantize the Model with the Model Compression Toolkit\n", + "Let's begin by applying quantization using MCT. This process will prepare the model for export.\n", + "\n", + "### Representative Dataset\n", + "For post-training quantization with MCT, a representative dataset is required. In this example, we use a random dataset for demonstration purposes." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import numpy as np\n", + "import model_compression_toolkit as mct\n", + "\n", + "# Quantize the model.\n", + "# Notice that here the representative dataset is random for demonstration only.\n", + "quantized_exportable_model, _ = mct.ptq.keras_post_training_quantization(float_model,\n", + " representative_data_gen=lambda: [np.random.random((1, 224, 224, 3))])" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Keras export\n", + "The model will be exported as a tensorflow `.keras` model, where both weights and activations are represented as dtype float32.\n", + "There are two optional formats available for export: MCTQ and FAKELY_QUANT.\n", + "\n", + "#### MCTQ\n", + "\n", + "By default, `mct.exporter.keras_export_model` exports the quantized Keras model to a `.keras` model using custom quantizers from the mct_quantizers module. " + ], + "metadata": { + "id": "-n70LVe6DQPw" + } + }, + { + "cell_type": "code", + "source": [ + "# Path of exported model\n", + "keras_file_path = 'exported_model_mctq.keras'\n", + "\n", + "# Export a keras model with mctq custom quantizers.\n", + "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", + " save_model_path=keras_file_path)" + ], + "metadata": { + "id": "PO-Hh0bzD1VJ" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Note that the model's size remains unchanged compared to the quantized exportable model, as the weight data types are still represented as floats.\n", + "#### MCTQ - Loading the Exported Model\n", + "\n", + "To load the exported model with MCTQ quantizers, use `mct.keras_load_quantized_model`:" + ], + "metadata": { + "id": "Bwx5rxXDF_gb" + } + }, + { + "cell_type": "code", + "source": [ + "loaded_model = mct.keras_load_quantized_model(keras_file_path)" + ], + "metadata": { + "id": "q235XNJQmTdd" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Fakely-Quantized Format\n", + "To export a fakely-quantized model, use the `QuantizationFormat.FAKELY_QUANT` option. This format ensures that quantization is simulated but does not alter the data types of the weights and activations during export." + ], + "metadata": { + "id": "sOmDjSehlQba" + } + }, + { + "cell_type": "code", + "source": [ + "# Path of exported model\n", + "keras_file_path = 'exported_model_fakequant.keras'\n", + "\n", + "# Use mode KerasExportSerializationFormat.KERAS for a .keras model\n", + "# and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights\n", + "# and activations.\n", + "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", + " save_model_path=keras_file_path,\n", + " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)" + ], + "metadata": { + "id": "WLyHEEiwGByT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Note that the fakely-quantized model has the same size as the quantized exportable model, as the weights are still represented as floats.\n", + "\n", + "### TFLite\n", + "There are two optional tflite serializations available for export: `INT8` and `FAKELY_QUANT`.\n", + "\n", + "#### INT8 TFLite\n", + "\n", + "The model will be exported as a tflite model where weights and activations are represented as 8bit integers." + ], + "metadata": { + "id": "-L1aRxFGGFeF" + } + }, + { + "cell_type": "code", + "source": [ + "tflite_file_path = 'exported_model_int8.tflite'\n", + "\n", + "# Use mode KerasExportSerializationFormat.TFLITE for tflite model and quantization_format.INT8.\n", + "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", + " save_model_path=tflite_file_path,\n", + " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", + " quantization_format=mct.exporter.QuantizationFormat.INT8)" + ], + "metadata": { + "id": "V4I-p1q5GLzs" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Compare size of float and quantized model:\n" + ], + "metadata": { + "id": "SBqtJV9AGRzN" + } + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "\n", + "# Save float model to measure its size\n", + "float_file_path = 'exported_model_float.keras'\n", + "float_model.save(float_file_path)\n", + "\n", + "print(\"Float model in Mb:\", os.path.getsize(float_file_path) / float(2 ** 20))\n", + "print(\"Quantized model in Mb:\", os.path.getsize(tflite_file_path) / float(2 ** 20))" + ], + "metadata": { + "id": "LInM16OMGUtF" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Fakely-Quantized TFLite\n", + "\n", + "The model will be exported as a tflite model where weights and activations are quantized but represented with a float data type." + ], + "metadata": { + "id": "9eVDoIHiGX5-" + } + }, + { + "cell_type": "code", + "source": [ + "# Path of exported model\n", + "tflite_file_path = 'exported_model_fakequant.tflite'\n", + "\n", + "\n", + "# Use mode KerasExportSerializationFormat.TFLITE for tflite model and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights\n", + "# and activations.\n", + "mct.exporter.keras_export_model(model=quantized_exportable_model,\n", + " save_model_path=tflite_file_path,\n", + " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", + " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)" + ], + "metadata": { + "id": "0OYLAbI8Gawu" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Note that the fakely-quantized model has the same size as the quantized exportable model, as the weights are still represented as floats." + ], + "metadata": { + "id": "voOrtCroD-HE" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bb7e1572" + }, + "source": [ + "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License.\n" + ] + } + ] } diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb index 02a78af2e..dba3eb617 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb @@ -48,9 +48,23 @@ "source": [ "TF_VER = '2.14.0'\n", "\n", - "!pip install -q tensorflow=={TF_VER}\n" + "!pip install -q tensorflow[and-cuda]~={TF_VER}" ] }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + }, + "id": "b681426ed5a2e48" + }, { "cell_type": "code", "execution_count": null, @@ -58,16 +72,37 @@ "source": [ "import tensorflow as tf\n", "import keras\n", - "import importlib.util\n", - "\n", - "if not importlib.util.find_spec('model_compression_toolkit'):\n", - " !pip install -q model_compression_toolkit" + "import importlib.util" ], "metadata": { "collapsed": false }, "id": "2c13aff20d208c51" }, + { + "cell_type": "markdown", + "source": [ + "Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." + ], + "metadata": { + "collapsed": false + }, + "id": "fc105d5cff93e87c" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input\n", + "\n", + "float_model = MobileNetV2()" + ], + "metadata": { + "collapsed": false + }, + "id": "bcf3c42006a0590c" + }, { "cell_type": "markdown", "id": "0c7fed0d-cfc8-41ee-adf1-22a98110397b", @@ -76,8 +111,9 @@ }, "source": [ "## Dataset preparation\n", - "\n", - "**Note** that for demonstration purposes we use the validation set for the model quantization and GPTQ optimization. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", + "### Download the ImageNet validation set\n", + "Download the ImageNet dataset with only the validation split.\n", + "**Note:** For demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", "\n", "This step may take several minutes..." ] @@ -105,7 +141,7 @@ { "cell_type": "markdown", "source": [ - "Rearrange the extracted data into folders per label " + "The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders." ], "metadata": { "collapsed": false @@ -147,97 +183,85 @@ }, { "cell_type": "markdown", - "id": "028112db-3143-4fcb-96ae-e639e6476c31", - "metadata": { - "id": "028112db-3143-4fcb-96ae-e639e6476c31" - }, "source": [ - "### Representative Dataset\n", - "\n", - "GPTQ is a gradient-based optimization process, which requires representative dataset to perform inference and compute gradients. \n", - "\n", - "Separate representative datasets can be used for the PTQ statistics collection and for GPTQ. In this tutorial we use the same representative dataset for both.\n", - "\n", - "A complete pass through the representative dataset generator constitutes an epoch (batch_size x n_iter samples). " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ed56f505-97ff-4acb-8ad8-ef09c53e9d57", + "These functions generate a `tf.data.Dataset` from image files in a directory." + ], "metadata": { - "id": "ed56f505-97ff-4acb-8ad8-ef09c53e9d57" + "collapsed": false }, - "outputs": [], - "source": [ - "def imagenet_preprocess_input(images, labels):\n", - " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels" - ] + "id": "51636a72dcd7fa3a" }, { "cell_type": "code", "execution_count": null, - "id": "0408f624-ab68-4989-95f8-f9d327882840", - "metadata": { - "id": "0408f624-ab68-4989-95f8-f9d327882840" - }, "outputs": [], "source": [ - "def get_representative_dataset(n_iter=10, batch_size=50):\n", + "def imagenet_preprocess_input(images, labels):\n", + " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels\n", + "\n", + "def get_dataset(batch_size, shuffle):\n", " dataset = tf.keras.utils.image_dataset_from_directory(\n", " directory='./imagenet/val',\n", " batch_size=batch_size,\n", " image_size=[224, 224],\n", - " shuffle=True,\n", + " shuffle=shuffle,\n", " crop_to_aspect_ratio=True,\n", " interpolation='bilinear')\n", - " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))\n", - "\n", - " def representative_dataset():\n", - " for _ in range(n_iter):\n", - " yield [dataset.take(1).get_single_element()[0].numpy()]\n", - "\n", - " return representative_dataset\n", - "\n", - "representative_dataset_gen = get_representative_dataset()" - ] + " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)\n", + " dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)\n", + " return dataset" + ], + "metadata": { + "collapsed": false + }, + "id": "1358c187cfd473f7" }, { "cell_type": "markdown", - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", + "id": "028112db-3143-4fcb-96ae-e639e6476c31", "metadata": { - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5" + "id": "028112db-3143-4fcb-96ae-e639e6476c31" }, "source": [ - "## Model Gradient-Based Post-Training quantization using MCT\n", + "## Representative Dataset\n", "\n", - "This is the main part in which we quantize and our model.\n", + "GPTQ is a gradient-based optimization process, which requires representative dataset to perform inference and compute gradients. \n", + "\n", + "Separate representative datasets can be used for the PTQ statistics collection and for GPTQ. In this tutorial we use the same representative dataset for both.\n", "\n", - "First, we load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." + "A complete pass through the representative dataset generator constitutes an epoch (batch_size x n_iter samples). " ] }, { "cell_type": "code", "execution_count": null, - "id": "80cac59f-ec5e-41ca-b673-96220924a47c", + "id": "0408f624-ab68-4989-95f8-f9d327882840", "metadata": { - "id": "80cac59f-ec5e-41ca-b673-96220924a47c" + "id": "0408f624-ab68-4989-95f8-f9d327882840" }, "outputs": [], "source": [ - "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "batch_size = 16\n", + "n_iter = 5\n", "\n", - "float_model = MobileNetV2()" + "dataset = get_dataset(batch_size, shuffle=True)\n", + "\n", + "def representative_dataset_gen():\n", + " for _ in range(n_iter):\n", + " yield [dataset.take(1).get_single_element()[0].numpy()]" ] }, { "cell_type": "markdown", - "id": "8a8b486a-ca39-45d9-8699-f7116b0414c9", + "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", "metadata": { - "id": "8a8b486a-ca39-45d9-8699-f7116b0414c9" + "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5" }, "source": [ - "Next, we create a GPTQ configuration with possible GPTQ optimization options (such as the number of epochs for the optimization process). MCT will quantize the model and start the GPTQ process to optimize the model’s parameters and quantization parameters.\n", + "## Model Gradient-Based Post-Training Quantization using MCT\n", + "\n", + "This is the main part in which we quantize and our model.\n", + "First, we create a GPTQ configuration with possible GPTQ optimization options (such as the number of epochs for the optimization process). MCT will quantize the model and start the GPTQ process to optimize the model’s parameters and quantization parameters.\n", "\n", "In addition, we need to define a TargetPlatformCapability object, representing the HW specifications on which we wish to eventually deploy our quantized model." ] @@ -306,9 +330,9 @@ "id": "5a7a5150-3b92-49b5-abb2-06e6c5c91d6b" }, "source": [ - "## Models evaluation\n", + "## Model Evaluation\n", "\n", - "In order to evaluate our models, we first need to load the validation dataset. As before, let's assume we downloaded the ImageNet validation dataset to a folder with the path below:" + "In order to evaluate our models, we first need to load the validation dataset." ] }, { @@ -321,18 +345,7 @@ }, "outputs": [], "source": [ - "def get_validation_dataset():\n", - " dataset = tf.keras.utils.image_dataset_from_directory(\n", - " directory='./imagenet/val',\n", - " batch_size=50,\n", - " image_size=[224, 224],\n", - " shuffle=False,\n", - " crop_to_aspect_ratio=True,\n", - " interpolation='bilinear')\n", - " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))\n", - " return dataset\n", - "\n", - "evaluation_dataset = get_validation_dataset()" + "val_dataset = get_dataset(batch_size=50, shuffle=False)" ] }, { @@ -342,9 +355,7 @@ "id": "9889d217-90a6-4615-8569-38dc9cdd5999" }, "source": [ - "Let's start with the floating-point model evaluation.\n", - "\n", - "We need to compile the model before evaluation and set the loss and the evaluation metric:" + "Let's start with the floating-point model evaluation. We need to compile the model before evaluation and set the loss and the evaluation metric." ] }, { @@ -357,7 +368,8 @@ "outputs": [], "source": [ "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", - "results = float_model.evaluate(evaluation_dataset)" + "float_accuracy = float_model.evaluate(val_dataset)\n", + "print(f\"Float model's Top 1 accuracy on the Imagenet validation set: {(float_accuracy[1] * 100):.2f}%\")" ] }, { @@ -380,17 +392,10 @@ "outputs": [], "source": [ "quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", - "results = quantized_model.evaluate(evaluation_dataset)" + "quantized_accuracy = float_model.evaluate(val_dataset)\n", + "print(f\"Quantized model's Top 1 accuracy on the Imagenet validation set: {(quantized_accuracy[1] * 100):.2f}%\")" ] }, - { - "cell_type": "markdown", - "source": [], - "metadata": { - "collapsed": false - }, - "id": "e316c34cadd054e7" - }, { "cell_type": "markdown", "id": "ebfbb4de-5b6e-4732-83d3-a21e96cdd866", @@ -398,18 +403,9 @@ "id": "ebfbb4de-5b6e-4732-83d3-a21e96cdd866" }, "source": [ - "You can see that we got a very small degradation with a compression rate of x4 !" - ] - }, - { - "cell_type": "markdown", - "source": [ + "You can see that we got a very small degradation with a compression rate of x4 !\n", "Now, we can export the model to Keras and TFLite:" - ], - "metadata": { - "id": "6YjIdiRRjgkL" - }, - "id": "6YjIdiRRjgkL" + ] }, { "cell_type": "code", @@ -433,18 +429,8 @@ "id": "14877777" }, "source": [ - "## Conclusion" - ] - }, - { - "cell_type": "markdown", - "id": "bb7e1572", - "metadata": { - "id": "bb7e1572" - }, - "source": [ - "In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with gradient-based optimization with a few lines of code. We saw that we can achieve an x4 compression ratio with minimal performance degradation.\n", - "\n" + "## Conclusion\n", + "In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with gradient-based optimization with a few lines of code. We saw that we can achieve an x4 compression ratio with minimal performance degradation.\n" ] }, { diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb index 73563d9c4..6eeec058d 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb @@ -7,26 +7,21 @@ "id": "f8194007-6ea7-4e00-8931-a37ca2d0dd20" }, "source": [ - "# Post Training Mixed Precision Quantization using the Model Compression Toolkit - A Quick-Start Guide\n", + "# Mixed-Precision Post-Training Quantization in Keras using the Model Compression Toolkit (MCT)\n", "\n", "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb)\n", "\n", "## Overview\n", "\n", - "\n", - "This tutorial demonstrates a pre-trained model quantization using the **Model Compression Toolkit (MCT)** with **Mixed Precision**. \n", - "\n", - "Mixed Precision enables quantization of different layers with different bit-width precisions, to fit the model into a set of hardware restrictions. \n", - "\n", - "As we will see, mixed-precision quantization is a simple yet effective quantization scheme for compressing a model to a desired model size.\n", + "This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a Keras model using post-training mixed-precision quantization. This method assigns different precision levels to various layers based on their impact on the model's output, helping the model meet hardware constraints. Mixed-precision quantization is an effective approach for compressing a model to a desired size while maintaining performance.\n", "\n", "## Summary\n", "\n", "In this tutorial we will cover:\n", "\n", - "1. Post-Training Mixed-Precision Quantization using MCT.\n", - "2. Loading and preprocessing ImageNet's validation dataset.\n", - "3. Constructing an unlabeled representative dataset.\n", + "1. Loading and preprocessing ImageNet’s validation dataset.\n", + "2. Constructing an unlabeled representative dataset.\n", + "3. Applying mixed-precision post-training quantization to the model's weights using MCT.\n", "4. Accuracy evaluation of the floating-point and the quantized models.\n", "\n", "## Setup\n", @@ -44,12 +39,24 @@ }, "outputs": [], "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "!pip install -q mct-nightly" + "TF_VER = '2.14'\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" ] }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + }, + "id": "aa9574240d461e7a" + }, { "cell_type": "code", "execution_count": null, @@ -61,10 +68,33 @@ "source": [ "import tensorflow as tf\n", "import keras\n", - "import model_compression_toolkit as mct\n", - "import os" + "import model_compression_toolkit as mct" ] }, + { + "cell_type": "markdown", + "source": [ + "Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." + ], + "metadata": { + "collapsed": false + }, + "id": "366579d0f3dec00a" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "\n", + "float_model = MobileNetV2()" + ], + "metadata": { + "collapsed": false + }, + "id": "cc1b963d133bd98d" + }, { "cell_type": "markdown", "id": "0c7fed0d-cfc8-41ee-adf1-22a98110397b", @@ -73,10 +103,9 @@ }, "source": [ "## Dataset preparation\n", - "\n", - "Download ImageNet dataset with only the validation split.\n", - "\n", - "**Note** that for demonstration purposes we use the validation set for the model quantization and mixed precision routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", + "### Download the ImageNet validation set\n", + "Download the ImageNet dataset with only the validation split.\n", + "**Note:** For demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", "\n", "This step may take several minutes..." ] @@ -90,12 +119,15 @@ }, "outputs": [], "source": [ + "import os\n", + " \n", "if not os.path.isdir('imagenet'):\n", " !mkdir imagenet\n", - " !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", - " !mv ILSVRC2012_devkit_t12.tar.gz imagenet/\n", - " !wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", - " !mv ILSVRC2012_img_val.tar imagenet/" + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", + " \n", + " !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \\\n", + " mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val" ] }, { @@ -105,7 +137,7 @@ "collapsed": false }, "source": [ - "Extract ImageNet validation dataset using torchvision \"datasets\" module" + "The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders." ] }, { @@ -117,111 +149,119 @@ }, "outputs": [], "source": [ - "import torchvision\n", - "if not os.path.isdir('imagenet/val'):\n", - " torchvision.datasets.ImageNet(root='./imagenet', split='val')" + "from pathlib import Path\n", + "import shutil\n", + "\n", + "root = Path('./imagenet')\n", + "imgs_dir = root / 'ILSVRC2012_img_val'\n", + "target_dir = root /'val'\n", + "\n", + "def extract_labels():\n", + " !pip install -q scipy\n", + " import scipy\n", + " mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)\n", + " cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} \n", + " with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:\n", + " return [cls_to_nid[int(cls)] for cls in f.readlines()]\n", + "\n", + "if not target_dir.exists():\n", + " labels = extract_labels()\n", + " for lbl in set(labels):\n", + " os.makedirs(target_dir / lbl)\n", + " \n", + " for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):\n", + " shutil.move(imgs_dir / img_file, target_dir / lbl)" ] }, { "cell_type": "markdown", - "id": "028112db-3143-4fcb-96ae-e639e6476c31", - "metadata": { - "id": "028112db-3143-4fcb-96ae-e639e6476c31" - }, "source": [ - "Define the required preprocessing method for the pretrained model,\n", - "and create a generator for the representative dataset, which is required for mixed precision quantization.\n", - "\n", - "The representative dataset is used for collecting statistics on the inference outputs of all layers in the model.\n", - " \n", - "In order to decide on the size of the representative dataset, we configure the batch size and the number of calibration iterations.\n", - "This gives us the total number of samples that will be used during PTQ (batch_size x n_iter).\n", - "In this example we set `batch_size = 50` and `n_iter = 10`, resulting in a total of 500 representative images.\n", - "\n", - "Please ensure that the dataset path has been set correctly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "cf5e859077a736f4", + "These functions generate a `tf.data.Dataset` from image files in a directory." + ], "metadata": { "collapsed": false }, - "outputs": [], - "source": [ - "def imagenet_preprocess_input(images, labels):\n", - " \"\"\"\n", - " Use the keras applications preprocess function.\n", - " Args:\n", - " images: input image batch.\n", - " labels: input label batch.\n", - " Returns:\n", - " preprocessed images & labels\n", - " \"\"\"\n", - " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels" - ] + "id": "1b2a1839a0cca729" }, { "cell_type": "code", "execution_count": null, - "id": "0408f624-ab68-4989-95f8-f9d327882840", - "metadata": { - "collapsed": false - }, "outputs": [], "source": [ - "def get_representative_dataset(n_iter=10, batch_size=50):\n", - " \"\"\"\n", - " Download the ImageNet validation set locally and create the representative dataset generator.\n", - " Returns:\n", - " representative dataset generator for calibration\n", - " \"\"\"\n", - " print('loading dataset, this may take a few minutes ...')\n", + "def imagenet_preprocess_input(images, labels):\n", + " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels\n", + "\n", + "def get_dataset(batch_size, shuffle):\n", " dataset = tf.keras.utils.image_dataset_from_directory(\n", " directory='./imagenet/val',\n", " batch_size=batch_size,\n", " image_size=[224, 224],\n", - " shuffle=True,\n", + " shuffle=shuffle,\n", " crop_to_aspect_ratio=True,\n", " interpolation='bilinear')\n", - " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))\n", - "\n", - " def representative_dataset():\n", - " for _ in range(n_iter):\n", - " yield [dataset.take(1).get_single_element()[0].numpy()]\n", - "\n", - " return representative_dataset\n", - "\n", - "representative_dataset_gen = get_representative_dataset()" - ] + " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)\n", + " dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)\n", + " return dataset" + ], + "metadata": { + "collapsed": false + }, + "id": "8bc12415c234e197" }, { "cell_type": "markdown", - "id": "4a1e9ba6-2954-4506-ad5c-0da273701ba5", + "source": [ + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:" + ], "metadata": { "collapsed": false }, + "id": "c5747150a2052fb" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], "source": [ - "## Model Post-Training Mixed Precision quantization using MCT\n", + "batch_size = 32\n", + "n_iter = 10\n", "\n", - "This is the main part in which we quantize our model.\n", + "dataset = get_dataset(batch_size, shuffle=True)\n", "\n", - "First, we load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." - ] + "def representative_dataset_gen():\n", + " for _ in range(n_iter):\n", + " yield [dataset.take(1).get_single_element()[0].numpy()]" + ], + "metadata": { + "collapsed": false + }, + "id": "ac2f38f151896ed9" }, { - "cell_type": "code", - "execution_count": null, - "id": "80cac59f-ec5e-41ca-b673-96220924a47c", + "cell_type": "markdown", + "source": [ + "## Target Platform Capabilities (TPC)\n", + "In addition, MCT optimizes models for dedicated hardware platforms using Target Platform Capabilities (TPC). \n", + "**Note:** To apply mixed-precision quantization to specific layers, the TPC must define different bit-width options for those layers. For more details, please refer to our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html). In this example, we use the default Tensorflow TPC, which supports 2, 4, and 8-bit options for convolution and linear layers" + ], "metadata": { - "id": "80cac59f-ec5e-41ca-b673-96220924a47c" + "collapsed": false }, + "id": "aaa928cd96e04989" + }, + { + "cell_type": "code", + "execution_count": null, "outputs": [], "source": [ - "from keras.applications.mobilenet_v2 import MobileNetV2\n", - "float_model = MobileNetV2()" - ] + "# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Keras layers' representation.\n", + "target_platform_cap = mct.get_target_platform_capabilities(\"tensorflow\", 'imx500', target_platform_version='v1')" + ], + "metadata": { + "collapsed": false + }, + "id": "8b4708ad0203629" }, { "cell_type": "markdown", @@ -230,17 +270,12 @@ "id": "8a8b486a-ca39-45d9-8699-f7116b0414c9" }, "source": [ - "Next, we need to define a **mixed precision quantization configuration** with possible mixed precision search options.\n", - "MCT will search a mixed precision solution (namely, bit-width assignment for each layer)\n", - "and quantize the model according to this configuration.\n", - "**Note** that you can skip this part if you prefer to use the default quantization settings.\n", + "## Mixed Precision Configurations\n", + "We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n", + "1. **Number of images** - Determines how many images from the representative dataset are used to find an optimal bit-width configuration. More images result in higher accuracy but increase search time.\n", + "2. **Gradient weighting** - Improves bit-width configuration accuracy at the cost of longer search time. This method will not be used in this example.\n", "\n", - "In addition, we need to define a `TargetPlatformCapability` object, representing the HW specifications on which we wish to eventually deploy our quantized model.\n", - "The candidates bit-width for quantization are defined in the target platform model. \n", - "\n", - "Finally, we need to set the **hardware constraints** which we want our quantized model to fit into.\n", - "These are defined using a `ResourceUtilization` object.\n", - "In this example, we set a **weights memory** constraint, by computing the size of the desired model's parameters under a compression of the model to 75% of its fixed-point 8-bit precision." + "MCT will determine a bit-width for each layer and quantize the model based on this configuration. The candidate bit-widths for quantization should be defined in the target platform model." ] }, { @@ -252,28 +287,42 @@ }, "outputs": [], "source": [ - "# Enable Mixed-Precision config. For the sake of running faster, the hessian-based scores are disabled in this tutorial\n", - "mp_config = mct.core.MixedPrecisionQuantizationConfig(\n", + "configuration = mct.core.CoreConfig(\n", + " mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n", " num_of_images=32,\n", - " use_hessian_based_scores=False)\n", - "core_config = mct.core.CoreConfig(mixed_precision_config=mp_config)\n", - "# Specify the target platform capability (TPC)\n", - "tpc = mct.get_target_platform_capabilities(\"tensorflow\", 'imx500', target_platform_version='v1')\n", - "\n", - "# Get Resource Utilization information to constraint your model's memory size. Retrieve a ResourceUtilization object with helpful information of each resource metric, to constraint the quantized model to the desired memory size.\n", - "resource_utilization_data = mct.core.keras_resource_utilization_data(float_model,\n", - " representative_dataset_gen,\n", - " core_config=core_config,\n", - " target_platform_capabilities=tpc)\n", + " use_hessian_based_scores=False))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "To enable mixed-precision quantization, we define the desired compression ratio. In this example, we will configure the model to compress the weights to **75% of the size of the 8-bit model's weights**. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)." + ], + "metadata": { + "collapsed": false + }, + "id": "af1a0ca127d59767" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Get Resource Utilization information to constraint your model's memory size.\n", + "resource_utilization_data = mct.core.keras_resource_utilization_data(\n", + " float_model,\n", + " representative_dataset_gen,\n", + " configuration,\n", + " target_platform_capabilities=target_platform_cap)\n", "\n", - "# Set a constraint for each of the Resource Utilization metrics.\n", - "# Create a ResourceUtilization object to limit our returned model's size. Note that this values affects only layers and attributes\n", - "# that should be quantized (for example, the kernel of Conv2D in Keras will be affected by this value,\n", - "# while the bias will not)\n", - "# examples:\n", "weights_compression_ratio = 0.75 # About 0.75 of the model's weights memory size when quantized with 8 bits.\n", + "# Create a ResourceUtilization object \n", "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * weights_compression_ratio)" - ] + ], + "metadata": { + "collapsed": false + }, + "id": "38e16383bec13fbd" }, { "cell_type": "markdown", @@ -282,8 +331,8 @@ "collapsed": false }, "source": [ - "### Run model Post-Training Quantization\n", - "Finally, we quantize our model using MCT's post-training quantization API." + "### Run Post-Training Quantization with Mixed Precision\n", + "Now, we are ready to use MCT to quantize the model." ] }, { @@ -299,18 +348,8 @@ " float_model,\n", " representative_dataset_gen,\n", " target_resource_utilization=resource_utilization,\n", - " core_config=core_config,\n", - " target_platform_capabilities=tpc)" - ] - }, - { - "cell_type": "markdown", - "id": "7382ada6-d001-4564-907d-767fa4e9ec56", - "metadata": { - "id": "7382ada6-d001-4564-907d-767fa4e9ec56" - }, - "source": [ - "That's it! Our model is now quantized." + " core_config=configuration,\n", + " target_platform_capabilities=target_platform_cap)" ] }, { @@ -320,9 +359,8 @@ "id": "5a7a5150-3b92-49b5-abb2-06e6c5c91d6b" }, "source": [ - "## Models evaluation\n", - "\n", - "In order to evaluate our models, we first need to load the validation dataset. As before, let's assume we downloaded the ImageNet validation dataset to a folder with the path below:" + "## Model evaluation\n", + "In order to evaluate our models, we first need to load the validation dataset." ] }, { @@ -335,23 +373,7 @@ }, "outputs": [], "source": [ - "def get_validation_dataset():\n", - " \"\"\"\n", - " Generate validation dataset\n", - " Returns:\n", - " the validation dataset\n", - " \"\"\"\n", - " dataset = tf.keras.utils.image_dataset_from_directory(\n", - " directory='./imagenet/val',\n", - " batch_size=50,\n", - " image_size=[224, 224],\n", - " shuffle=False,\n", - " crop_to_aspect_ratio=True,\n", - " interpolation='bilinear')\n", - " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))\n", - " return dataset\n", - "\n", - "evaluation_dataset = get_validation_dataset()" + "val_dataset = get_dataset(batch_size=50, shuffle=False)" ] }, { @@ -361,9 +383,7 @@ "id": "9889d217-90a6-4615-8569-38dc9cdd5999" }, "source": [ - "Let's start with the floating-point model evaluation.\n", - "\n", - "We need to compile the model before evaluation and set the loss and the evaluation metric:" + "Let's start with the floating-point model evaluation. We need to compile the model before evaluation and set the loss and the evaluation metric." ] }, { @@ -376,7 +396,8 @@ "outputs": [], "source": [ "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", - "results = float_model.evaluate(evaluation_dataset)" + "float_accuracy = float_model.evaluate(val_dataset)\n", + "print(f\"Float model's Top 1 accuracy on the Imagenet validation set: {(float_accuracy[1] * 100):.2f}%\")" ] }, { @@ -399,17 +420,10 @@ "outputs": [], "source": [ "quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", - "results = quantized_model.evaluate(evaluation_dataset)" + "quantized_accuracy = quantized_model.evaluate(val_dataset)\n", + "print(f\"Quantized model's Top 1 accuracy on the Imagenet validation set: {(quantized_accuracy[1] * 100):.2f}%\")" ] }, - { - "cell_type": "markdown", - "id": "e316c34cadd054e7", - "metadata": { - "collapsed": false - }, - "source": [] - }, { "cell_type": "markdown", "id": "ebfbb4de-5b6e-4732-83d3-a21e96cdd866", @@ -417,17 +431,7 @@ "id": "ebfbb4de-5b6e-4732-83d3-a21e96cdd866" }, "source": [ - "You can see that we got a very small degradation with a compression rate of x4 !" - ] - }, - { - "cell_type": "markdown", - "id": "6YjIdiRRjgkL", - "metadata": { - "id": "6YjIdiRRjgkL" - }, - "source": [ - "Now, we can export the model to Keras and TFLite:" + "Now, we can export the quantized model to Keras and TFLite:" ] }, { @@ -439,9 +443,11 @@ }, "outputs": [], "source": [ - "mct.exporter.keras_export_model(model=quantized_model, save_model_path='qmodel.tflite',\n", - " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", - " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)\n", + "mct.exporter.keras_export_model(\n", + " model=quantized_model,\n", + " save_model_path='qmodel.tflite',\n", + " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", + " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)\n", "\n", "mct.exporter.keras_export_model(model=quantized_model, save_model_path='qmodel.keras')" ] @@ -453,18 +459,9 @@ "id": "14877777" }, "source": [ - "## Conclusion" - ] - }, - { - "cell_type": "markdown", - "id": "bb7e1572", - "metadata": { - "id": "bb7e1572" - }, - "source": [ - "In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with mixed-precision with a few lines of code. We saw that we can achieve more than x4 compression ratio with minimal performance degradation.\n", - "\n" + "## Conclusion\n", + "In this tutorial, we demonstrated how to quantize a classification model using the mixed precision feature with MCT. \n", + "MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113)." ] }, { diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb index 457a61cf3..28e880cab 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb @@ -7,7 +7,8 @@ }, "kernelspec": { "name": "python3", - "display_name": "Python 3" + "language": "python", + "display_name": "Python 3 (ipykernel)" }, "language_info": { "name": "python" @@ -17,15 +18,22 @@ { "cell_type": "markdown", "source": [ - "# Network Editor Usage\n", + "# How to Use the Network Editor to Easily Modify Quantization Configurations in the Model Compression Toolkit (MCT)\n", "\n", "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb)\n", "\n", - "## Introduction\n", + "## Overview\n", + "In this tutorial, we will demonstrate how to utilize the Model Compression Toolkit (MCT) to quantize a simple Keras model and modify the quantization configuration for specific layers using MCT’s network editor. The example model comprises a `Conv2D` layer followed by a `Dense` layer.\n", "\n", - "In this tutorial, we will demonstrate how to leverage the Model Compression Toolkit (MCT) to quantize a simple Keras model and modify the quantization configuration for specific layers using the MCT's network editor. Our example model consists of a Conv2D layer followed by a Dense layer. Initially, we will quantize this model with a default configuration and inspect the bit allocation for each layer's weights. Then, we will introduce an edit rule to specifically quantize the Conv2D layer with a different bit width, showcasing the flexibility of MCT in customizing quantization schemes per layer.\n", + "## Summary\n", + "In this tutorial, we will cover:\n", "\n", - "First, we install MCT and import requiered modules:" + "1. Quantizing the model using the default configuration and inspecting bit allocation for each layer.\n", + "2. Applying a custom edit rule to adjust the bit-width for the `Conv2D` layer.\n", + "3. Showcasing MCT’s flexibility for layer-specific quantization.\n", + "\n", + "## Setup\n", + "Install and import the relevant packages:" ], "metadata": { "id": "C_BBKEpTRqp_" @@ -39,31 +47,41 @@ }, "outputs": [], "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "! pip install -q mct-nightly" + "TF_VER = '2.14'\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" ] }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "code", "source": [ "import model_compression_toolkit as mct\n", "import numpy as np\n", - "import tensorflow as tf\n", "from tensorflow.keras.layers import Input, Conv2D, Dense\n", "from tensorflow.keras.models import Model" ], "metadata": { "id": "vCsjoKb7168U" }, - "execution_count": 2, + "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ - "Now, we create a simple Keras model with a Conv2D layer and a Dense layer:" + "Next, we will create a simple Keras model consisting of a `Conv2D` layer followed by a `Dense` layer." ], "metadata": { "id": "bRPoKI-WSQn2" @@ -82,13 +100,14 @@ "metadata": { "id": "uOu8c7n_6Vd4" }, - "execution_count": 3, + "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ - "In this tutorial, for demonstration purposes and to expedite the process, we create a simple representative dataset generator using random data. This generator produces a batch of random input data matching the model's input shape." + "### Represenatative Dataset\n", + "In this tutorial, for demonstration purposes and to expedite the process, we create a simple representative dataset generator using random data. This generator produces batches of random input data that match the model’s input shape." ], "metadata": { "id": "rDAMPxKhSYfx" @@ -99,18 +118,19 @@ "source": [ "batch_size = 1\n", "def representative_data_gen():\n", - " yield [np.random.randn(batch_size, *input_shape)]\n" + " yield [np.random.randn(batch_size, *input_shape)]" ], "metadata": { "id": "LvnQmku02qIM" }, - "execution_count": 4, + "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ - "Let's define a function that takes a Keras model, a representative data generator, and a core configuration for quantization. The function utilizes Model Compression Toolkit's post-training quantization API:" + "## Model Quantization with MCT\n", + "Let’s define a function that takes a Keras model, a representative data generator, and a core configuration for quantization. The function will use the MCT’s post-training quantization (PTQ) API to apply quantization to the model." ], "metadata": { "id": "VecsI-kDe9RM" @@ -119,25 +139,52 @@ { "cell_type": "code", "source": [ - "\n", "def quantize_keras_mct(model, representative_data_gen, core_config):\n", " quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", " in_model=model,\n", " representative_data_gen=representative_data_gen,\n", " core_config=core_config\n", " )\n", - " return quantized_model\n" + " return quantized_model" ], "metadata": { "id": "uIyyoMv93Bt7" }, - "execution_count": 5, + "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ - "In this section, we start by setting a default core configuration for quantization using Model Compression Toolkit's CoreConfig. After quantizing the model with this configuration, we examine the number of bits used in the quantization of specific layers. We retrieve and print the number of bits used for the the layers' attribute called 'kernel' in both the Conv2D layer and the Dense layer. By default 8-bit are used for quantization across different types of layers in a model." + "We define a function to inspect the bit-width used for quantizing specific layers. The function retrieves and prints the bit-width for the `kernel` attribute in both the `Conv2D` and `Dense` layers." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def print_model_weights_by_layer(model):\n", + " conv2d_layer = model.layers[2]\n", + " conv2d_nbits = conv2d_layer.weights_quantizers['kernel'].get_config()['num_bits']\n", + " \n", + " dense_layer = model.layers[4]\n", + " dense_nbits = dense_layer.weights_quantizers['kernel'].get_config()['num_bits']\n", + " \n", + " print(f\"Conv2D nbits: {conv2d_nbits}, Dense nbits: {dense_nbits}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Quantization\n", + "In this section, we start by setting a default core configuration for quantization using MCT’s `CoreConfig`. With this configuration, the model is quantized using the default 8-bit precision for all layer types. Next, we print the bit-width settings to verify the quantization of both the Conv2D and Dense layers." ], "metadata": { "id": "Xqmg7vWNgsqc" @@ -148,15 +195,8 @@ "source": [ "# Use default core config for observing baseline quantized model\n", "core_config = mct.core.CoreConfig()\n", - "\n", "quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)\n", - "conv2d_layer = quantized_model.layers[2]\n", - "conv2d_nbits = conv2d_layer.weights_quantizers['kernel'].get_config()['num_bits']\n", - "\n", - "dense_layer = quantized_model.layers[4]\n", - "dense_nbits = dense_layer.weights_quantizers['kernel'].get_config()['num_bits']\n", - "\n", - "print(f\"Conv2D nbits: {conv2d_nbits}, Dense nbits: {dense_nbits}\")" + "print_model_weights_by_layer(quantized_model)" ], "metadata": { "id": "Z5VDv6Bz4cqN" @@ -169,11 +209,11 @@ "source": [ "## Edit Configration Using Edit Rules List\n", "\n", - " Now let's see how to customize the quantization process for specific layers using MCT's network editor. An `EditRule` is created with a `NodeTypeFilter` targeting the Conv2D layer type.\n", + " Now, let's customize the quantization process for specific layers using MCT’s network editor. We create an `EditRule` with a `NodeTypeFilter` targeting the `Conv2D` layer type.\n", "\n", - " The action associated with this rule changes the quantization configuration of the 'kernel' attribute to 4 bits instead of the default 8 bits. This rule is then included in a list (`edit_rules_list`) which is passed to the `DebugConfig`.\n", - " \n", - " The `DebugConfig`, with our custom rule, is then used to create a `CoreConfig`. This configuration will be applied when quantizing the model, resulting in the Conv2D layers being quantized using 4 bits while other layers follow the default setting." + "The associated action changes the kernel attribute’s bit-width to 4 bits instead of the default 8 bits. This rule is then added to an `edit_rules_list`, which is passed to `DebugConfig`.\n", + "\n", + "The custom `DebugConfig` is used to create a `CoreConfig`, enabling `Conv2D` layers to be quantized at 4 bits while other layers retain the default configuration." ], "metadata": { "id": "FyBwtQuMhQMt" @@ -190,24 +230,22 @@ "]\n", "\n", "debug_config = mct.core.DebugConfig(network_editor=edit_rules_list)\n", - "core_config = mct.core.CoreConfig(debug_config=debug_config)" + "core_config_edit_weight_bits = mct.core.CoreConfig(debug_config=debug_config)" ], "metadata": { "id": "7YynVSSh3Mk-" }, - "execution_count": 7, + "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ - "In this final part of the tutorial, we apply the customized quantization process to our Keras model.\n", - "\n", - "By calling `quantize_keras_mct` with the `core_config` containing our edit rule, we specifically quantize the Conv2D layer using 4 bits, as per our custom configuration.\n", + "Now we will apply this customized quantization configuration to the Keras model.\n", "\n", - "The `quantized_model` now reflects these changes. We then extract and display the number of bits used for quantization in both the Conv2D and Dense layers.\n", + "By calling `quantize_keras_mct` with the `core_config` containing our edit rule, we quantize the `Conv2D` layer using 4 bits as specified. The resulting `quantized_model` reflects these changes, which we verify by inspecting the bit-width used in both the `Conv2D` and `Dense` layers.\n", "\n", - "The output demonstrates the effect of our edit rule: the Conv2D layer is quantized with 4 bits while the Dense layer retains the default 8-bit quantization." + "The output confirms the effect of the edit rule: the `Conv2D` layer is quantized with 4 bits, while the `Dense` layer retains the default 8-bit setting." ], "metadata": { "id": "ftkeDjZPiahd" @@ -216,14 +254,8 @@ { "cell_type": "code", "source": [ - "quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)\n", - "conv2d_layer = quantized_model.layers[2]\n", - "conv2d_nbits = conv2d_layer.weights_quantizers['kernel'].get_config()['num_bits']\n", - "\n", - "dense_layer = quantized_model.layers[4]\n", - "dense_nbits = dense_layer.weights_quantizers['kernel'].get_config()['num_bits']\n", - "\n", - "print(f\"Conv2D nbits: {conv2d_nbits}, Dense nbits: {dense_nbits}\")" + "quantized_model = quantize_keras_mct(model, representative_data_gen, core_config_edit_weight_bits)\n", + "print_model_weights_by_layer(quantized_model)" ], "metadata": { "id": "7p6qFWoEQBS5" @@ -235,16 +267,11 @@ "cell_type": "markdown", "source": [ "## Edit Z-Threshold for Activation Quantization\n", + "In model quantization, the Z-Threshold helps manage outliers in activation data, which can negatively impact the efficiency and accuracy of the quantization process. It sets a boundary to exclude extreme values when determining quantization parameters, improving robustness and model performance.\n", "\n", - "In the context of model quantization, the Z-Threshold helps in handling outliers in the activation data. Outliers in the data can hurt the quantization process, leading to less efficient and potentially less accurate models.\n", + "Adjusting the Z-Threshold is useful for fine-tuning model accuracy and handling outliers. A higher Z-Threshold includes more data, potentially accounting for outliers, while a lower value effectively filters them out.\n", "\n", - "The Z-Threshold is used to set a boundary, beyond which extreme values in the activation data are considered outliers and are not used to determine the quantization parameters. This approach effectively filters out extreme values, ensuring a more robust and representative quantization.\n", - "\n", - "Adjusting the Z-Threshold can be particularly useful during the debugging and optimization of model quantization. By tweaking this parameter, you can fine-tune the balance between model accuracy and robustness against outliers in your specific use case.\n", - "\n", - "A higher Z-Threshold means more data is considered during quantization, including some outliers, which might be necessary for certain models or datasets.\n", - "\n", - "The following code demonstrates how you can customize the Z-Threshold for a specific layer type (Conv2D) in a Keras model using MCT's network editor functionality. This feature allows you to set different Z-Threshold values for different layers. By default, all layers use threshold of infinity (thus, no outlier-removal occurs)." + "The following code demonstrates how to customize the Z-Threshold for specific layer types, such as `Conv2D`, using MCT’s network editor. By default, all layers have an infinite threshold, meaning no outlier removal occurs." ], "metadata": { "id": "2TqXTB48jKHx" @@ -262,8 +289,8 @@ "]\n", "\n", "debug_config = mct.core.DebugConfig(network_editor=edit_rules_list)\n", - "core_config = mct.core.CoreConfig(debug_config=debug_config)\n", - "quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)" + "core_config_edit_z_threshold = mct.core.CoreConfig(debug_config=debug_config)\n", + "quantized_model = quantize_keras_mct(model, representative_data_gen, core_config_edit_z_threshold)" ], "metadata": { "id": "VBRfQqZVjN3J" @@ -274,6 +301,10 @@ { "cell_type": "markdown", "source": [ + "## Conclusion\n", + "In this tutorial, we explored how to leverage the Model Compression Toolkit (MCT) for quantizing Keras models and customizing the quantization configuration for specific layers using the network editor. We started by applying the default 8-bit quantization and inspecting the results. Then, we demonstrated how to use the network editor to modify the bit-width for individual layers and fine-tune activation quantization using Z-Threshold adjustments.\n", + "\n", + "\n", "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", "\n", "Licensed under the Apache License, Version 2.0 (the \"License\");\n", diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb new file mode 100644 index 000000000..f576d7ee2 --- /dev/null +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb @@ -0,0 +1,456 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Post-Training Quantization in Keras using the Model Compression Toolkit (MCT)\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post_training_quantization.ipynb)\n", + "\n", + "## Overview\n", + "This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a Keras model. We will load a pre-trained model and quantize it using the MCT with **Post-Training Quatntization (PTQ)**. Finally, we will evaluate the quantized model and export it to a Keras or TFLite files.\n", + "\n", + "## Summary\n", + "In this tutorial, we will cover:\n", + "\n", + "1. Loading and preprocessing ImageNet’s validation dataset.\n", + "2. Constructing an unlabeled representative dataset.\n", + "3. Post-Training Quantization using MCT.\n", + "4. Accuracy evaluation of the floating-point and the quantized models.\n", + "5. Exporting the model to Keras and TFLite files.\n", + "\n", + "## Setup\n", + "Install the relevant packages:" + ], + "metadata": { + "collapsed": false + }, + "id": "37caa075419872cc" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "TF_VER = '2.14'\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" + ], + "metadata": { + "collapsed": false + }, + "id": "2227c2812088b426" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + }, + "id": "1849396447aa75e8" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import keras\n", + "import tensorflow as tf" + ], + "metadata": { + "collapsed": false + }, + "id": "96817134aaa61465" + }, + { + "cell_type": "markdown", + "source": [ + "Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format." + ], + "metadata": { + "collapsed": false + }, + "id": "f0d72559f34c030a" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "\n", + "float_model = MobileNetV2()" + ], + "metadata": { + "collapsed": false + }, + "id": "14f37bd9e1421650" + }, + { + "cell_type": "markdown", + "source": [ + "## Dataset preparation\n", + "### Download the ImageNet validation set\n", + "Download the ImageNet dataset with only the validation split.\n", + "**Note:** For demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n", + "\n", + "This step may take several minutes..." + ], + "metadata": { + "collapsed": false + }, + "id": "b8fac30930c364bb" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import os\n", + " \n", + "if not os.path.isdir('imagenet'):\n", + " !mkdir imagenet\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n", + " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n", + " \n", + " !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \\\n", + " mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val" + ], + "metadata": { + "collapsed": false + }, + "id": "b4796f00822e1abf" + }, + { + "cell_type": "markdown", + "source": [ + "The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders." + ], + "metadata": { + "collapsed": false + }, + "id": "f1c42d68573f3534" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import shutil\n", + "\n", + "root = Path('./imagenet')\n", + "imgs_dir = root / 'ILSVRC2012_img_val'\n", + "target_dir = root /'val'\n", + "\n", + "def extract_labels():\n", + " !pip install -q scipy\n", + " import scipy\n", + " mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)\n", + " cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} \n", + " with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:\n", + " return [cls_to_nid[int(cls)] for cls in f.readlines()]\n", + "\n", + "if not target_dir.exists():\n", + " labels = extract_labels()\n", + " for lbl in set(labels):\n", + " os.makedirs(target_dir / lbl)\n", + " \n", + " for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):\n", + " shutil.move(imgs_dir / img_file, target_dir / lbl)\n" + ], + "metadata": { + "collapsed": false + }, + "id": "b50b55c50a41999d" + }, + { + "cell_type": "markdown", + "source": [ + "These functions generate a `tf.data.Dataset` from image files in a directory." + ], + "metadata": { + "collapsed": false + }, + "id": "65372a40dc9ce89c" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def imagenet_preprocess_input(images, labels):\n", + " return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels\n", + "\n", + "def get_dataset(batch_size, shuffle):\n", + " dataset = tf.keras.utils.image_dataset_from_directory(\n", + " directory='./imagenet/val',\n", + " batch_size=batch_size,\n", + " image_size=[224, 224],\n", + " shuffle=shuffle,\n", + " crop_to_aspect_ratio=True,\n", + " interpolation='bilinear')\n", + " dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)\n", + " dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)\n", + " return dataset" + ], + "metadata": { + "collapsed": false + }, + "id": "70e23b77b41fc6e1" + }, + { + "cell_type": "markdown", + "source": [ + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:" + ], + "metadata": { + "collapsed": false + }, + "id": "ab88ee0beaff2186" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "batch_size = 32\n", + "n_iter = 10\n", + "\n", + "dataset = get_dataset(batch_size, shuffle=True)\n", + "\n", + "def representative_dataset_gen():\n", + " for _ in range(n_iter):\n", + " yield [dataset.take(1).get_single_element()[0].numpy()]" + ], + "metadata": { + "collapsed": false + }, + "id": "c164088f1882bad8" + }, + { + "cell_type": "markdown", + "source": [ + "## Target Platform Capabilities\n", + "MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Tensorflow TPC:" + ], + "metadata": { + "collapsed": false + }, + "id": "d7cf37cd66fca511" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import model_compression_toolkit as mct\n", + "\n", + "# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Keras layers representation.\n", + "target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default')" + ], + "metadata": { + "collapsed": false + }, + "id": "259e2cf078cd3dfe" + }, + { + "cell_type": "markdown", + "source": [ + "## Post-Training Quantization using MCT\n", + "Now for the exciting part! Let’s run PTQ on the model." + ], + "metadata": { + "collapsed": false + }, + "id": "c19234f699c75374" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", + " in_model=float_model,\n", + " representative_data_gen=representative_dataset_gen,\n", + " target_platform_capabilities=target_platform_cap\n", + ")" + ], + "metadata": { + "collapsed": false + }, + "id": "a791d320d064f950" + }, + { + "cell_type": "markdown", + "source": [ + "Our model is now quantized. MCT has created a simulated quantized model within the original Keras framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `KerasQuantizationWrapper` and `KerasActivationQuantizationHolder`, wrap Keras layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x." + ], + "metadata": { + "collapsed": false + }, + "id": "877eef17e44c57c3" + }, + { + "cell_type": "markdown", + "source": [ + "## Model Evaluation\n", + "In order to evaluate our models, we first need to load the validation dataset." + ], + "metadata": { + "collapsed": false + }, + "id": "bac59bdc7eb51d15" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "val_dataset = get_dataset(batch_size=50, shuffle=False)" + ], + "metadata": { + "collapsed": false + }, + "id": "e8af62e22913de3e" + }, + { + "cell_type": "markdown", + "source": [ + "Let's start with the floating-point model evaluation. We need to compile the model before evaluation and set the loss and the evaluation metric." + ], + "metadata": { + "collapsed": false + }, + "id": "9455ac334c8c23da" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", + "float_accuracy = float_model.evaluate(val_dataset)\n", + "print(f\"Float model's Top 1 accuracy on the Imagenet validation set: {(float_accuracy[1] * 100):.2f}%\")" + ], + "metadata": { + "collapsed": false + }, + "id": "15a98cc475926458" + }, + { + "cell_type": "markdown", + "source": [ + "Finally, let's evaluate the quantized model:" + ], + "metadata": { + "collapsed": false + }, + "id": "3085f431ccc6fdee" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=\"accuracy\")\n", + "quantized_accuracy = quantized_model.evaluate(val_dataset)\n", + "print(f\"Quantized model's Top 1 accuracy on the Imagenet validation set: {(quantized_accuracy[1] * 100):.2f}%\")" + ], + "metadata": { + "collapsed": false + }, + "id": "c427527160845924" + }, + { + "cell_type": "markdown", + "source": [ + "You can see that we got a very small degradation with a compression rate of x4 !\n", + "Now, we can export the quantized model to Keras and TFLite:" + ], + "metadata": { + "collapsed": false + }, + "id": "959526e5ae914e6b" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "mct.exporter.keras_export_model(\n", + " model=quantized_model,\n", + " save_model_path='qmodel.tflite',\n", + " serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,\n", + " quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)\n", + "\n", + "mct.exporter.keras_export_model(model=quantized_model, save_model_path='qmodel.keras')" + ], + "metadata": { + "collapsed": false + }, + "id": "6d431b13e8ac5e4d" + }, + { + "cell_type": "markdown", + "source": [ + "## Conclusion\n", + "\n", + "In this tutorial, we demonstrated how to quantize a classification model in a hardware-friendly manner using MCT. We observed that a 4x compression ratio was achieved with minimal performance degradation.\n", + "\n", + "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n", + "\n", + "MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper](https://arxiv.org/abs/2109.09113).\n", + "\n", + "## Copyrights\n", + "\n", + "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License.\n" + ], + "metadata": { + "collapsed": false + }, + "id": "78c7a00d0acb623d" + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb index de99d8b5c..71e2d84dc 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb @@ -1,444 +1,448 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Structured Pruning of a Fully-Connected Keras Model\n", - "\n", - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb)\n", - "\n", - "Welcome to this tutorial, where we will guide you through training, pruning, and retraining a fully connected Keras model. We'll begin by constructing and training a simple neural network using the Keras framework. Following this, we will introduce and apply model pruning using MCT to reduce the size of our network. Finally, we'll retrain our pruned model to recover its degraded performance due to the pruning process.\n", - "\n", - "\n", - "## Installing TensorFlow and Model Compression Toolkit\n", - "\n", - "We start by setting up our environment by installing TensorFlow and Model Compression Toolkit and importing them." - ], - "metadata": { - "id": "UJDzewEYfSN5" - } - }, - { - "cell_type": "code", - "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "!pip install mct-nightly\n" - ], - "metadata": { - "id": "xTvVA__4NItc" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Q2bAksKtM0ca" - }, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "import tensorflow_datasets as tfds\n", - "import model_compression_toolkit as mct\n", - "import numpy as np\n", - "from tensorflow.keras.datasets import mnist\n", - "from tensorflow.keras.models import Sequential\n", - "from tensorflow.keras.layers import Dense, Flatten" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Loading and Preprocessing MNIST\n", - "\n", - "Let's create the train and test parts of MNIST dataset including preprocessing:" - ], - "metadata": { - "id": "tW1xcK_Kf4F_" - } - }, - { - "cell_type": "code", - "source": [ - "# Load the MNIST dataset\n", - "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", - "\n", - "# Normalize the images to [0, 1] range\n", - "train_images = train_images.astype('float32') / 255.0\n", - "test_images = test_images.astype('float32') / 255.0\n" - ], - "metadata": { - "id": "fwtJHnflfv_f" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Creating a Fully-Connected Model\n", - "\n", - "In this tutorial section, we create a simple toy example of a fully connected model to demonstrate the pruning process using MCT. It consists of three dense layers with 128, 64, and 10 neurons.\n", - "\n", - "Notably, MCT's structured pruning will target the first two dense layers for pruning, as these layers offer the opportunity to reduce output channels. This reduction can be effectively propagated by adjusting the input channels of subsequent layers.\n", - "\n", - "Once our model is created, we compile it to prepare the model for training and evaluation.\n" - ], - "metadata": { - "id": "m3vu7-uvgtfC" - } - }, - { - "cell_type": "code", - "source": [ - "def create_model():\n", - " model = tf.keras.models.Sequential([\n", - " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", - " tf.keras.layers.Dense(128, activation='relu'),\n", - " tf.keras.layers.Dense(64, activation='relu'),\n", - " tf.keras.layers.Dense(10)\n", - " ])\n", - " model.compile(\n", - " optimizer=tf.keras.optimizers.Adam(0.001),\n", - " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", - " metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],\n", - " )\n", - " return model" - ], - "metadata": { - "id": "If3oj5jSjXen" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Training Dense Model on MNIST\n", - "\n", - "Now, we can train our model using the dataset we load and evaluate it." - ], - "metadata": { - "id": "Q_tK6Xknbtha" - } - }, - { - "cell_type": "code", - "source": [ - "# Train and evaluate the model\n", - "model = create_model()\n", - "model.fit(train_images, train_labels, epochs=6, validation_data=(test_images, test_labels))\n", - "\n", - "model.evaluate(test_images, test_labels)" - ], - "metadata": { - "id": "jQ3_9Z1WllVV" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Dense Model Properties\n", - "\n", - "The model.summary() function in Keras provides a snapshot of the model's architecture, including layers, their types, output shapes, and the number of parameters.\n" - ], - "metadata": { - "id": "ZQHxLrsvcLKH" - } - }, - { - "cell_type": "code", - "source": [ - "model.summary()" - ], - "metadata": { - "id": "oxdespw2eeBW" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "Let's break down what we see in our model summary:\n", - "\n", - "- First Dense Layer: A fully connected layer with 128 output channels and 784 input channels.\n", - "\n", - "- Second Dense Layer: A fully connected layer with 64 output channels and 128 input channels.\n", - "\n", - "- Third Dense Layer: The final dense layer with 10 neurons (as per the number of MNIST classes) and 64 input channels.\n", - "\n", - "The total parameters amount to 109,386, which roughly requiers 427.29 KB." - ], - "metadata": { - "id": "GymibwxQehOL" - } - }, - { - "cell_type": "markdown", - "source": [ - "## MCT Structured Pruning\n", - "\n", - "### Create TPC\n", - "\n", - "Firstly, we'll set up the Target Platform Capabilities (TPC) to specify each layer's SIMD (Single Instruction, Multiple Data) size.\n", - "\n", - "In MCT, SIMD plays a crucial role in channel grouping, affecting the pruning decision process based on channel importance for each SIMD group of channels.\n", - "\n", - "We'll use the simplest structured pruning scenario for this demonstration with SIMD=1." - ], - "metadata": { - "id": "RKatTp55emtF" - } - }, - { - "cell_type": "code", - "source": [ - "from model_compression_toolkit.target_platform_capabilities.target_platform import Signedness\n", - "tp = mct.target_platform\n", - "\n", - "simd_size = 1\n", - "\n", - "def get_tpc():\n", - " # Define the default weight attribute configuration\n", - " default_weight_attr_config = tp.AttributeQuantizationConfig(\n", - " weights_quantization_method=tp.QuantizationMethod.UNIFORM,\n", - " weights_n_bits=None,\n", - " weights_per_channel_threshold=None,\n", - " enable_weights_quantization=None,\n", - " lut_values_bitwidth=None\n", - " )\n", - "\n", - " # Define the OpQuantizationConfig\n", - " default_config = tp.OpQuantizationConfig(\n", - " default_weight_attr_config=default_weight_attr_config,\n", - " attr_weights_configs_mapping={},\n", - " activation_quantization_method=tp.QuantizationMethod.UNIFORM,\n", - " activation_n_bits=8,\n", - " supported_input_activation_n_bits=8,\n", - " enable_activation_quantization=None,\n", - " quantization_preserving=None,\n", - " fixed_scale=None,\n", - " fixed_zero_point=None,\n", - " simd_size=simd_size,\n", - " signedness=Signedness.AUTO\n", - " )\n", - "\n", - " # Create the quantization configuration options and model\n", - " default_configuration_options = tp.QuantizationConfigOptions([default_config])\n", - " tp_model = tp.TargetPlatformModel(default_configuration_options)\n", - "\n", - " # Return the target platform capabilities\n", - " tpc = tp.TargetPlatformCapabilities(tp_model)\n", - " return tpc\n" - ], - "metadata": { - "id": "wqZ71s70jXhH" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### Create a Representative Dataset\n", - "\n", - "We are creating a representative dataset to guide our model pruning process for computing importance score for each channel:" - ], - "metadata": { - "id": "SnKxedEgqdSm" - } - }, - { - "cell_type": "code", - "source": [ - "import random\n", - "\n", - "def representative_data_gen():\n", - " indices = random.sample(range(len(train_images)), 32)\n", - " yield [np.stack([train_images[i] for i in indices])]" - ], - "metadata": { - "id": "SCiXV1s9jswp" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### Create Resource Utilization constraint\n", - "\n", - "We're defining a resource_utilization limit to constrain the memory usage of our pruned model.\n", - "\n", - "By setting a target that limits the model's weight memory to half of its original size (around 427KB), we aim to achieve a compression ratio of 50%:" - ], - "metadata": { - "id": "nylQtALnr9gN" - } - }, - { - "cell_type": "code", - "source": [ - "# Create a ResourceUtilization object to limit the pruned model weights memory to a certain resource constraint\n", - "dense_model_memory = 427*(2**10) # Original model weights requiers ~427KB\n", - "compression_ratio = 0.5\n", - "\n", - "resource_utilization = mct.core.ResourceUtilization(weights_memory=dense_model_memory*compression_ratio)" - ], - "metadata": { - "id": "doJgwbSxsCbr" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### Prune Model\n", - "\n", - "We're ready to execute the actual pruning using MCT's keras_pruning_experimental function. The model is pruned according to our defined target Resource Utilization and using the representative dataset generated earlier.\n", - "\n", - "Each channel's importance is measured using LFH (Label-Free-Hessian)\n", - "which approximates the Hessian of the loss function w.r.t model's weights.\n", - "\n", - "In this example, we've used just one score approximation for efficiency. Although this is less time-consuming, it's worth noting that using multiple approximations would yield more precise importance scores in real-world applications. However, this precision comes with a trade-off in terms of longer processing times.\n", - "\n", - "The result is a pruned model and associated pruning information, which includes details about the pruning masks and scores for each layer." - ], - "metadata": { - "id": "xSP6815rsCnc" - } - }, - { - "cell_type": "code", - "source": [ - "num_score_approximations = 1\n", - "\n", - "target_platform_cap = get_tpc()\n", - "pruned_model, pruning_info = mct.pruning.keras_pruning_experimental(\n", - " model=model,\n", - " target_resource_utilization=resource_utilization,\n", - " representative_data_gen=representative_data_gen,\n", - " target_platform_capabilities=target_platform_cap,\n", - " pruning_config=mct.pruning.PruningConfig(num_score_approximations=num_score_approximations)\n", - " )" - ], - "metadata": { - "id": "x4taG-5TxBrp" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "### Pruned Model Properties\n", - "\n", - "As before, we can use Keras model's API to observe the new architecture and details of the pruned model:" - ], - "metadata": { - "id": "iPd6ezZN2DNp" - } - }, - { - "cell_type": "code", - "source": [ - "pruned_model.summary()" - ], - "metadata": { - "id": "xZu4gPwz2Ptp" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Retraining Pruned Model\n", - "\n", - "After pruning models, it's common to observe a temporary drop in the model's accuracy. This decline directly results from reducing the model's complexity through pruning." - ], - "metadata": { - "id": "pAheQ9SGxB13" - } - }, - { - "cell_type": "code", - "source": [ - "pruned_model.compile(\n", - " optimizer=tf.keras.optimizers.Adam(0.001),\n", - " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", - " metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],\n", - ")\n", - "pruned_model.evaluate(test_images, test_labels)" - ], - "metadata": { - "id": "Vpihq5fpdeSA" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "However, to recover the performance, we retrain the pruned model, allowing it to adapt to its new, compressed architecture. The model can regain, and sometimes even surpass, its original accuracy through retraining." - ], - "metadata": { - "id": "IHORL34t17bA" - } - }, - { - "cell_type": "code", - "source": [ - "pruned_model.fit(train_images, train_labels, epochs=6, validation_data=(test_images, test_labels))\n", - "pruned_model.evaluate(test_images, test_labels)" - ], - "metadata": { - "id": "q00zV9Jmjszo" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bb7e1572" - }, - "source": [ - "Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.\n", - "\n", - "Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "you may not use this file except in compliance with the License.\n", - "You may obtain a copy of the License at\n", - "\n", - " http://www.apache.org/licenses/LICENSE-2.0\n", - "\n", - "Unless required by applicable law or agreed to in writing, software\n", - "distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "See the License for the specific language governing permissions and\n", - "limitations under the License.\n" - ] - } - ] + "kernelspec": { + "name": "python3", + "language": "python", + "display_name": "Python 3 (ipykernel)" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Structured Pruning of a Fully-Connected Keras Model using the Model Compression Toolkit (MCT)\n", + "\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb)\n", + "\n", + "## Overview\n", + "This tutorial provides a step-by-step guide to training, pruning, and finetuning a Keras fully connected neural network model using the Model Compression Toolkit (MCT). We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", + "\n", + "## Summary\n", + "In this tutorial, we will cover:\n", + "\n", + "1. **Training a Keras model on MNIST:** We'll begin by constructing a basic fully connected neural network and training it on the MNIST dataset. \n", + "2. **Applying structured pruning:** We'll introduce a pruning technique to reduce model size while maintaining performance. \n", + "3. **Finetuning the pruned model:** After pruning, we'll finetune the model to recover any lost accuracy. \n", + "4. **Evaluating the pruned model:** We'll evaluate the pruned model’s performance and compare it to the original model.\n", + "\n", + "## Setup\n", + "Install the relevant packages:" + ], + "metadata": { + "id": "UJDzewEYfSN5" + } + }, + { + "cell_type": "code", + "source": [ + "TF_VER = '2.14.0'\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" + ], + "metadata": { + "id": "xTvVA__4NItc" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Q2bAksKtM0ca" + }, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import model_compression_toolkit as mct\n", + "import numpy as np\n", + "from tensorflow.keras.datasets import mnist" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Loading and Preprocessing MNIST\n", + "Let's define the dataset loaders to retrieve the train and test parts of the MNIST dataset, including preprocessing:" + ], + "metadata": { + "id": "tW1xcK_Kf4F_" + } + }, + { + "cell_type": "code", + "source": [ + "# Load the MNIST dataset\n", + "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", + "\n", + "# Normalize the images to [0, 1] range\n", + "train_images = train_images.astype('float32') / 255.0\n", + "test_images = test_images.astype('float32') / 255.0\n" + ], + "metadata": { + "id": "fwtJHnflfv_f" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Creating a Fully-Connected Model\n", + "In this section, we create a simple example of a fully connected model to demonstrate the pruning process. It consists of three dense layers with 128, 64, and 10 neurons. After defining the model architecture, we compile it to prepare for training and evaluation." + ], + "metadata": { + "id": "m3vu7-uvgtfC" + } + }, + { + "cell_type": "code", + "source": [ + "def create_model():\n", + " model = tf.keras.models.Sequential([\n", + " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", + " tf.keras.layers.Dense(128, activation='relu'),\n", + " tf.keras.layers.Dense(64, activation='relu'),\n", + " tf.keras.layers.Dense(10)\n", + " ])\n", + " model.compile(\n", + " optimizer=tf.keras.optimizers.Adam(0.001),\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],\n", + " )\n", + " return model" + ], + "metadata": { + "id": "If3oj5jSjXen" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Training Dense Model on MNIST\n", + "Next, we will train the dense model using the preprocessed MNIST dataset." + ], + "metadata": { + "id": "Q_tK6Xknbtha" + } + }, + { + "cell_type": "code", + "source": [ + "# Train and evaluate the model\n", + "model = create_model()\n", + "model.fit(train_images, train_labels, epochs=6, validation_data=(test_images, test_labels))\n", + "model.evaluate(test_images, test_labels)" + ], + "metadata": { + "id": "jQ3_9Z1WllVV" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Dense Model Properties\n", + "The `model.summary()` function in Keras provides a comprehensive overview of the model's architecture, including each layer's type, output shapes, and the number of trainable parameters." + ], + "metadata": { + "id": "ZQHxLrsvcLKH" + } + }, + { + "cell_type": "code", + "source": [ + "model.summary()" + ], + "metadata": { + "id": "oxdespw2eeBW" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Let's break down the details from our model summary:\n", + "\n", + "- **First Dense Layer:** A fully connected layer with 128 output channels and 784 input channels.\n", + "- **Second Dense Layer:** A fully connected layer with 64 output channels and 128 input channels.\n", + "- **Third Dense Layer:** The final layer with 10 neurons (matching the number of MNIST classes) and 64 input channels.\n", + "\n", + "The model has a total of 109,386 parameters, requiring approximately 427.29 KB of memory." + ], + "metadata": { + "id": "GymibwxQehOL" + } + }, + { + "cell_type": "markdown", + "source": [ + "## MCT Structured Pruning\n", + "\n", + "### Target Platform Capabilities (TPC)\n", + "MCT optimizes models for dedicated hardware using Target Platform Capabilities (TPC). For more details, please refer to our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). First, we'll configure the TPC to define each layer's SIMD (Single Instruction, Multiple Data) size.\n", + "\n", + "In MCT, SIMD plays a key role in channel grouping, influencing the pruning process by considering channel importance within each SIMD group.\n", + "\n", + "For this demonstration, we'll use the simplest structured pruning scenario with SIMD set to 1." + ], + "metadata": { + "id": "RKatTp55emtF" + } + }, + { + "cell_type": "code", + "source": [ + "from model_compression_toolkit.target_platform_capabilities.target_platform import Signedness\n", + "tp = mct.target_platform\n", + "\n", + "simd_size = 1\n", + "\n", + "def get_tpc():\n", + " # Define the default weight attribute configuration\n", + " default_weight_attr_config = tp.AttributeQuantizationConfig(\n", + " weights_quantization_method=tp.QuantizationMethod.UNIFORM,\n", + " )\n", + "\n", + " # Define the OpQuantizationConfig\n", + " default_config = tp.OpQuantizationConfig(\n", + " default_weight_attr_config=default_weight_attr_config,\n", + " attr_weights_configs_mapping={},\n", + " activation_quantization_method=tp.QuantizationMethod.UNIFORM,\n", + " activation_n_bits=8,\n", + " supported_input_activation_n_bits=8,\n", + " enable_activation_quantization=None,\n", + " quantization_preserving=None,\n", + " fixed_scale=None,\n", + " fixed_zero_point=None,\n", + " simd_size=simd_size,\n", + " signedness=Signedness.AUTO\n", + " )\n", + "\n", + " # Create the quantization configuration options and model\n", + " default_configuration_options = tp.QuantizationConfigOptions([default_config])\n", + " tp_model = tp.TargetPlatformModel(default_configuration_options)\n", + "\n", + " # Return the target platform capabilities\n", + " tpc = tp.TargetPlatformCapabilities(tp_model)\n", + " return tpc\n" + ], + "metadata": { + "id": "wqZ71s70jXhH" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Representative Dataset\n", + "We are creating a representative dataset to guide the model pruning process. It is used to compute an importance score for each channel. This dataset is implemented as a generator that returns a list of images." + ], + "metadata": { + "id": "SnKxedEgqdSm" + } + }, + { + "cell_type": "code", + "source": [ + "import random\n", + "\n", + "def representative_data_gen():\n", + " indices = random.sample(range(len(train_images)), 32)\n", + " yield [np.stack([train_images[i] for i in indices])]" + ], + "metadata": { + "id": "SCiXV1s9jswp" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Resource Utilization\n", + "We define a `resource_utilization` limit to constrain the memory usage of the pruned model. We'll prune our trained model to reduce its size, aiming for a 50% reduction in the memory footprint of the model's weights. Since the weights use the float32 data type (each parameter occupying 4 bytes), we calculate the memory usage by multiplying the total number of parameters by 4. By setting a target to limit the model's weight memory to around 214 KB, we aim for a 50% compression ratio." + ], + "metadata": { + "id": "nylQtALnr9gN" + } + }, + { + "cell_type": "code", + "source": [ + "# Create a ResourceUtilization object to limit the pruned model weights memory to a certain resource constraint\n", + "dense_model_memory = 427*(2**10) # Original model weights requiers ~427KB\n", + "compression_ratio = 0.5\n", + "\n", + "resource_utilization = mct.core.ResourceUtilization(weights_memory=dense_model_memory*compression_ratio)" + ], + "metadata": { + "id": "doJgwbSxsCbr" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Model Pruning\n", + "We are now ready to perform the actual pruning using MCT’s `keras_pruning_experimental` function. The model will be pruned based on the defined resource utilization constraints and the previously generated representative dataset.\n", + "\n", + "Each channel’s importance is measured using the [LFH (Label-Free-Hessian) method](https://arxiv.org/abs/2309.11531), which approximates the Hessian of the loss function with respect to the model’s weights.\n", + "\n", + "For efficiency, we use a single score approximation. Although less precise, it significantly reduces processing time compared to multiple approximations, which offer better accuracy but at the cost of longer runtimes.\n", + "\n", + "MCT’s structured pruning will target the first two dense layers, where output channel reduction can be propagated to subsequent layers by adjusting their input channels accordingly.\n", + "\n", + "The output is a pruned model along with pruning information, including layer-specific pruning masks and scores." + ], + "metadata": { + "id": "xSP6815rsCnc" + } + }, + { + "cell_type": "code", + "source": [ + "num_score_approximations = 1\n", + "\n", + "target_platform_cap = get_tpc()\n", + "pruned_model, pruning_info = mct.pruning.keras_pruning_experimental(\n", + " model=model,\n", + " target_resource_utilization=resource_utilization,\n", + " representative_data_gen=representative_data_gen,\n", + " target_platform_capabilities=target_platform_cap,\n", + " pruning_config=mct.pruning.PruningConfig(num_score_approximations=num_score_approximations)\n", + " )" + ], + "metadata": { + "id": "x4taG-5TxBrp" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Pruned Model Properties\n", + "As before, we can use the Keras model API to inspect the new architecture and details of the pruned model." + ], + "metadata": { + "id": "iPd6ezZN2DNp" + } + }, + { + "cell_type": "code", + "source": [ + "pruned_model.summary()" + ], + "metadata": { + "id": "xZu4gPwz2Ptp" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Finetuning the Pruned Model\n", + "After pruning, it’s common to see a temporary drop in model accuracy due to the reduction in model complexity. Let’s demonstrate this by evaluating the pruned model and observing its initial performance before finetuning." + ], + "metadata": { + "id": "pAheQ9SGxB13" + } + }, + { + "cell_type": "code", + "source": [ + "pruned_model.compile(\n", + " optimizer=tf.keras.optimizers.Adam(0.001),\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],\n", + ")\n", + "pruned_model.evaluate(test_images, test_labels)" + ], + "metadata": { + "id": "Vpihq5fpdeSA" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "To restore the model's performance, we finetune the pruned model, allowing it to adapt to its new, compressed architecture. Through this finetuning process, the model can often recover its original accuracy, and in some cases, even surpass it." + ], + "metadata": { + "id": "IHORL34t17bA" + } + }, + { + "cell_type": "code", + "source": [ + "pruned_model.fit(train_images, train_labels, epochs=6, validation_data=(test_images, test_labels))\n", + "pruned_model.evaluate(test_images, test_labels)" + ], + "metadata": { + "id": "q00zV9Jmjszo" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Conclusion\n", + "In this tutorial, we explored the process of structured model pruning using MCT to optimize a dense neural network. We demonstrated how to define resource constraints, apply pruning based on channel importance, and evaluate the impact on model architecture and performance. Finally, we showed how finetuning can recover the pruned model’s accuracy. This approach highlights the effectiveness of structured pruning for reducing model size while maintaining performance, making it a powerful tool for model optimization." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bb7e1572" + }, + "source": [ + "Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License.\n" + ] + } + ] } diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_qat.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_qat.ipynb index 8720b8dc5..9ab4ade70 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_qat.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_qat.ipynb @@ -7,44 +7,52 @@ "tags": [] }, "source": [ - "# Quantization Aware Training using the Model Compression Toolkit - example in Keras\n" - ] - }, - { - "cell_type": "markdown", - "id": "af1a972f-01a5-4b56-8ce7-ecfdb6daf942", - "metadata": {}, - "source": [ + "# Quantization Aware Training using the Model Compression Toolkit - example in Keras\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_qat.ipynb)\n", "## Overview\n", - "This tutorial will show how to use the Quantization Aware Training API of the Model Compression Toolkit. We will train a model on the MNIST dataset and quantize it with the Model Compression Toolkit QAT API.\n", - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_qat.ipynb)" - ] - }, - { - "cell_type": "markdown", - "id": "80481dd9-1e3c-4677-9d94-33f144ec540c", - "metadata": {}, - "source": [ + "This tutorial will demonstrate how to use the Quantization Aware Training (QAT) API of the Model Compression Toolkit (MCT). We will train a neural network on the MNIST dataset and apply quantization using the MCT QAT API to optimize the model for efficient hardware deployment without sacrificing accuracy.\n", + "\n", + "## Summary\n", + "In this tutorial, we will cover:\n", + "\n", + "1. **Training a Keras model on MNIST:** We'll begin by constructing a simple neural network and training it on the MNIST dataset. \n", + "2. **Configuring Target Platform Capabilities (TPC):** Define the quantization settings for weights and activations.\n", + "3. **Preparing the Model for QAT:** Convert the floating-point model into a QAT-ready model using MCT. \n", + "4. **Training the Model with QAT:** Perform quantization-aware training to preserve model accuracy.\n", + "5. **Evaluating and Exporting the Quantized Model:** Finalize and export the optimized quantized model for deployment.\n", + "\n", "## Setup\n", - "Install relevant packages" + "Install the relevant packages:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "b380c492-3c53-4ec1-987e-de693a1ec1d9", "metadata": {}, "outputs": [], "source": [ "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "! pip install -q mct-nightly" + "!pip install -q tensorflow[and-cuda]~={TF_VER}" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + }, + "id": "ee8ecbde24b55fbf" + }, + { + "cell_type": "code", + "execution_count": null, "id": "d49c27b1-65f9-4fd3-be3e-733f4c60124a", "metadata": {}, "outputs": [], @@ -52,25 +60,129 @@ "import tensorflow as tf\n", "from keras import Model, layers, datasets\n", "import model_compression_toolkit as mct\n", - "import numpy as np\n" + "import numpy as np" ] }, { "cell_type": "markdown", - "id": "fcc817e1-5c21-4283-8ec8-8c2aff5feeea", + "source": [ + "## Loading and Preprocessing MNIST\n", + "Let's define the dataset loaders to retrieve the train and test parts of the MNIST dataset, including preprocessing:" + ], "metadata": { - "id": "fcc817e1-5c21-4283-8ec8-8c2aff5feeea" + "collapsed": false }, + "id": "d2c524e5d4985e47" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], "source": [ - "## Create TargetPlatformCapabilities\n", - "For this tutorial, we will use a TargetPlatformCapabilities (TPC) with quantization of 2 bits for weights and 3 bits for activations.\n", + "num_classes = 10\n", + "input_shape = (28, 28, 1)\n", "\n", - "You can skip this part and use [get_target_platform_capabilities](https://sony.github.io/model_optimization/docs/api/api_docs/methods/get_target_platform_capabilities.html) to get an initilized TPC." - ] + "# Load the MNIST dataset\n", + "(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()\n", + "\n", + "# Normalize the images to [0, 1] range\n", + "train_images = train_images.astype('float32') / 255.0\n", + "test_images = test_images.astype('float32') / 255.0\n", + "\n", + "# Add Channels axis to data\n", + "train_images = np.expand_dims(train_images, -1)\n", + "test_images = np.expand_dims(test_images, -1)\n", + "\n", + "# convert class vectors to binary class matrices\n", + "train_labels = tf.keras.utils.to_categorical(train_labels, num_classes)\n", + "test_labels = tf.keras.utils.to_categorical(test_labels, num_classes)" + ], + "metadata": { + "collapsed": false + }, + "id": "729d91394f1ded4a" + }, + { + "cell_type": "markdown", + "source": [ + "## Creating a Keras Model\n", + "In this section, we create a simple Keras model to demonstrate the QAT process. The model consists of two convolutional layers, two dense layers, and dropout layers for regularization." + ], + "metadata": { + "collapsed": false + }, + "id": "fb31c629993369f7" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "def create_model():\n", + " _input = layers.Input(shape=input_shape)\n", + " x = layers.Conv2D(16, 3, strides=2, padding='same', activation='relu')(_input)\n", + " x = layers.Conv2D(32, 3, strides=2, padding='same', activation='relu')(x)\n", + " x = layers.Flatten()(x)\n", + " x = layers.Dropout(0.5)(x)\n", + " x = layers.Dense(128, activation='relu')(x)\n", + " x = layers.Dropout(0.5)(x)\n", + " x = layers.Dense(num_classes, activation='softmax')(x)\n", + " model = Model(inputs=_input, outputs=x)\n", + " model.summary()\n", + " model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n", + " return model" + ], + "metadata": { + "collapsed": false + }, + "id": "5c6f97097cbf81f9" + }, + { + "cell_type": "markdown", + "source": [ + "## Training the Model on MNIST\n", + "Next, we will train the dense model using the preprocessed MNIST dataset." + ], + "metadata": { + "collapsed": false + }, + "id": "5a31cff183889d31" + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "epochs = 6\n", + "batch_size = 128\n", + "\n", + "# Train and evaluate the model\n", + "model = create_model()\n", + "model.fit(train_images, train_labels, epochs=epochs, batch_size=batch_size, validation_data=(test_images, test_labels))\n", + "model.evaluate(test_images, test_labels)" + ], + "metadata": { + "collapsed": false + }, + "id": "37e6f320b23217c1" + }, + { + "cell_type": "markdown", + "source": [ + "## Preparing the Model for Hardware-Friendly Quantization Aware Training with MCT\n", + "## Target Platform Capabilities\n", + "MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). In this tutorial, we use a TPC configuration that applies 2-bit quantization for weights and 3-bit quantization for activations.\n", + "\n", + "If desired, you can skip this step and directly use the pre-configured [`get_target_platform_capabilities`](https://sony.github.io/model_optimization/docs/api/api_docs/methods/get_target_platform_capabilities.html) function to obtain an initialized TPC." + ], + "metadata": { + "collapsed": false + }, + "id": "8f7e6ec541426aa5" }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "8bb6f84b-9775-4989-9f74-688958f3a1d3", "metadata": { "id": "8bb6f84b-9775-4989-9f74-688958f3a1d3" @@ -166,128 +278,80 @@ }, { "cell_type": "markdown", - "id": "bf7c811e-cba8-44f3-888f-e7452a68087d", - "metadata": {}, - "source": [ - "## Init Keras model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "690ff5ae-4474-4876-835f-ab2a2bbcb139", - "metadata": {}, - "outputs": [], - "source": [ - "num_classes = 10\n", - "input_shape = (28, 28, 1)\n", - "\n", - "_input = layers.Input(shape=input_shape)\n", - "x = layers.Conv2D(16, 3, strides=2, padding='same', activation='relu')(_input)\n", - "x = layers.Conv2D(32, 3, strides=2, padding='same', activation='relu')(x)\n", - "x = layers.Flatten()(x)\n", - "x = layers.Dropout(0.5)(x)\n", - "x = layers.Dense(128, activation='relu')(x)\n", - "x = layers.Dropout(0.5)(x)\n", - "x = layers.Dense(num_classes, activation='softmax')(x)\n", - "model = Model(inputs=_input, outputs=x)\n", - "\n", - "model.summary()" - ] - }, - { - "cell_type": "markdown", - "id": "7094f140-f86a-4d76-9042-83a0c99a796e", - "metadata": {}, "source": [ - "## Init MNIST dataset" - ] + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:" + ], + "metadata": { + "collapsed": false + }, + "id": "a73bebf51aaf672c" }, { "cell_type": "code", "execution_count": null, - "id": "464f2afd-0e80-4a80-86dd-1a26c7d3ea6a", - "metadata": {}, "outputs": [], "source": [ - "# Load the data and split it between train and test sets\n", - "(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()\n", - "\n", - "# Normalize images\n", - "x_train = x_train.astype(\"float32\") / 255\n", - "x_test = x_test.astype(\"float32\") / 255\n", - "\n", - "# Add Channels axis to data\n", - "x_train = np.expand_dims(x_train, -1)\n", - "x_test = np.expand_dims(x_test, -1)\n", + "n_iter = 10\n", "\n", - "# convert class vectors to binary class matrices\n", - "y_train = tf.keras.utils.to_categorical(y_train, num_classes)\n", - "y_test = tf.keras.utils.to_categorical(y_test, num_classes)\n" - ] + "def representative_data_gen():\n", + " def _generator():\n", + " for _ind in range(n_iter):\n", + " yield [train_images[_ind][np.newaxis, ...]]\n", + " return _generator" + ], + "metadata": { + "collapsed": false + }, + "id": "a82f6b9bae8269a7" }, { "cell_type": "markdown", - "id": "b00ab0db-ec7d-4d55-9c52-3440289e4ae1", - "metadata": {}, "source": [ - "## Train a Keras classifier model on MNIST" - ] + "### Creating a QAT-Ready Model with MCT\n", + "The MCT converts a floating-point model into a quantized model using post-training quantization. The returned model includes trainable quantizers and is ready for fine-tuning, making it a \"QAT-ready\" model." + ], + "metadata": { + "collapsed": false + }, + "id": "f4a247d3bde88990" }, { "cell_type": "code", "execution_count": null, - "id": "2a75d82e-e2a0-4204-a4b5-31263bc4b117", + "id": "2c171d2d-6f0d-474d-aab6-22b0b0c9e71e", "metadata": {}, "outputs": [], "source": [ - "# train float model\n", - "batch_size = 128\n", - "epochs = 15\n", - "\n", - "model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n", - "model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)\n", - "\n", - "# evaluate float model\n", - "score = model.evaluate(x_test, y_test, verbose=0)\n", - "print(f\"Float model test accuracy: {score[1]:02.4f}\")\n" + "qat_model, _, custom_objects = mct.qat.keras_quantization_aware_training_init_experimental(\n", + " model,\n", + " representative_data_gen(),\n", + " target_platform_capabilities=get_tpc())\n", + "qat_model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"], run_eagerly=True)" ] }, { "cell_type": "markdown", - "id": "6fd0d525-4bc1-4958-ac67-d114bd25a001", - "metadata": {}, "source": [ - "## Prepare model for Hardware-Friendly Quantization Aware Training with MCT\n", - "The MCT takes the float model and quantizes it in a post-training quantization fashion. The returned model contains trainable quantizers and is ready to be retrained (namely, a \"QAT ready\" model)." - ] + "Lets evaluate the performance after the basic post-trainig quantization." + ], + "metadata": { + "collapsed": false + }, + "id": "d38ba814d794cd57" }, { "cell_type": "code", "execution_count": null, - "id": "2c171d2d-6f0d-474d-aab6-22b0b0c9e71e", - "metadata": {}, "outputs": [], "source": [ - "n_iter = 10\n", - "\n", - "\n", - "def gen_representative_dataset():\n", - " def _generator():\n", - " for _ind in range(n_iter):\n", - " yield [x_train[_ind][np.newaxis, ...]]\n", - " return _generator\n", - "\n", - "\n", - "qat_model, _, custom_objects = mct.qat.keras_quantization_aware_training_init_experimental(model,\n", - " gen_representative_dataset(),\n", - " core_config=mct.core.CoreConfig(),\n", - " target_platform_capabilities=get_tpc())\n", - "\n", - "qat_model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"], run_eagerly=True)\n", - "score = qat_model.evaluate(x_test, y_test, verbose=0)\n", + "score = qat_model.evaluate(test_images, test_labels, verbose=0)\n", "print(f\"PTQ model test accuracy: {score[1]:02.4f}\")" - ] + ], + "metadata": { + "collapsed": false + }, + "id": "976a5942ac5fa9c6" }, { "cell_type": "markdown", @@ -304,12 +368,23 @@ "metadata": {}, "outputs": [], "source": [ - "qat_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)\n", + "qat_model.fit(train_images, train_labels, epochs=epochs, batch_size=batch_size, validation_split=0.2)\n", "\n", - "score = qat_model.evaluate(x_test, y_test, verbose=0)\n", - "print(f\"QAT model test accuracy: {score[1]:02.4f}\")\n" + "score = qat_model.evaluate(test_images, test_labels, verbose=0)\n", + "print(f\"QAT model test accuracy: {score[1]:02.4f}\")" ] }, + { + "cell_type": "markdown", + "source": [ + "## Finalizing the QAT model: \n", + "Remove the 'QuantizeWrapper' layers to retain only the layers with quantized weights (FakeQuant values)." + ], + "metadata": { + "collapsed": false + }, + "id": "d34761c03296a31a" + }, { "cell_type": "code", "execution_count": null, @@ -317,14 +392,23 @@ "metadata": {}, "outputs": [], "source": [ - "## Finalize QAT model: Remove QuantizeWrapper layers and leave only layers with quantized weights (FakeQuant values)\n", "quantized_model = mct.qat.keras_quantization_aware_training_finalize_experimental(qat_model)\n", "\n", "quantized_model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n", - "score = quantized_model.evaluate(x_test, y_test, verbose=0)\n", + "score = quantized_model.evaluate(test_images, test_labels, verbose=0)\n", "print(f\"Quantized model test accuracy: {score[1]:02.4f}\")" ] }, + { + "cell_type": "markdown", + "source": [ + "Now, we can export the quantized model to Keras:" + ], + "metadata": { + "collapsed": false + }, + "id": "52158ce9c8ded4bb" + }, { "cell_type": "code", "execution_count": null, @@ -332,17 +416,28 @@ "metadata": {}, "outputs": [], "source": [ - "# Export quantized model to Keras\n", - "mct.exporter.keras_export_model(model=quantized_model, \n", - " save_model_path='qmodel.keras')" + "mct.exporter.keras_export_model(model=quantized_model, save_model_path='qmodel.keras')" ] }, + { + "cell_type": "markdown", + "source": [ + "## Conclusion\n", + "In this tutorial, we explored how to perform Quantization Aware Training (QAT) using the Model Compression Toolkit (MCT) with a Keras model. We began by constructing a simple neural network and preparing it for quantization by configuring the Target Platform Capabilities (TPC). Then, we converted the model into a QAT-ready format and demonstrated how to train and fine-tune it using hardware-friendly quantization settings. This approach can significantly reduce the model size and improve inference speed while maintaining high accuracy, making it ideal for edge AI applications.\n", + "\n", + "Feel free to experiment with different configurations to see how they impact your models." + ], + "metadata": { + "collapsed": false + }, + "id": "8c4089ae72fb2c6d" + }, { "cell_type": "markdown", "id": "db77d678-1fa7-4dc0-a6f3-bac10ba2d8ed", "metadata": {}, "source": [ - "Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved.\n", + "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", "\n", "Licensed under the Apache License, Version 2.0 (the \"License\");\n", "you may not use this file except in compliance with the License.\n", diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_xquant.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_xquant.ipynb index 04bbb881b..229b3bce6 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_xquant.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_xquant.ipynb @@ -1,211 +1,207 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Quantization Troubleshooting with XQuant\n", - "\n", - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_xquant.ipynb)\n", - "\n", - "This notebook demonstrates the process of generating an Xquant report. The report provides valuable insights regarding the quality and success of the quantization process of a Keras model. This includes histograms and similarity metrics between the original float model and the quantized model in key points of the model. The report can be visualized using TensorBoard.\n", - "\n", - "## Steps:\n", - "1. Load a pre-trained MobileNetV2 model and perform post-training quantization.\n", - "5. Define an Xquant configuration.\n", - "6. Generate an Xquant report to compare the float and quantized models.\n", - "7. Visualize the report using TensorBoard." - ], - "metadata": { - "id": "ag0MtvPUkc8i" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Install" - ], - "metadata": { - "id": "EonIXpPQlR_6" - } - }, - { - "cell_type": "code", - "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "!pip install -q model-compression-toolkit" - ], - "metadata": { - "id": "kCLHJUhTlPDi" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Import necessary libraries" - ], - "metadata": { - "id": "UUrYYDITle3z" - } - }, - { - "cell_type": "code", - "source": [ - "import model_compression_toolkit as mct\n", - "import numpy as np\n", - "from functools import partial\n", - "from model_compression_toolkit.xquant import XQuantConfig\n", - "import tensorflow as tf" - ], - "metadata": { - "id": "NKKHNppSllmU" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Define data generator\n", - "For demonstration only, we will use a random dataset generator for the representative dataset and for the validation dataset:" - ], - "metadata": { - "id": "-4kQtkZLlnJj" - } - }, - { - "cell_type": "code", - "source": [ - "# Function to generate random data. If use_labels is True, it yields data with labels;\n", - "# otherwise, it yields only data.\n", - "def random_data_gen(shape=(224, 224, 3), use_labels=False, batch_size=2, num_iter=1):\n", - " if use_labels:\n", - " for _ in range(num_iter):\n", - " yield [[np.random.randn(batch_size, *shape)], np.random.randn(batch_size)]\n", - " else:\n", - " for _ in range(num_iter):\n", - " yield [np.random.randn(batch_size, *shape)]" - ], - "metadata": { - "id": "-xM1K6tVlna8" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Quantize MobileNetV2\n", - "\n", - "We will start by quantizing MobilNetV2 using `keras_post_training_quantization`:" - ], - "metadata": { - "id": "naWFGx_vl6tX" - } - }, - { - "cell_type": "code", - "source": [ - "# Load the pre-trained MobileNetV2 model and perform post-training quantization using\n", - "# the representative dataset generated by random_data_gen.\n", - "from keras.applications.mobilenet_v2 import MobileNetV2\n", - "float_model = MobileNetV2()\n", - "repr_dataset = random_data_gen\n", - "quantized_model, _ = mct.ptq.keras_post_training_quantization(in_model=float_model,\n", - " representative_data_gen=repr_dataset)" - ], - "metadata": { - "id": "RlAuiXAzl7Ef" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Generate report\n", - "\n", - "First, we will create an XQuantConfig object with the directory to use for logs and with custom similarity metrics to compute between key points of the model. Here, we use the `./logs` directory for saving the generated logs, and add MAE similarity metric to compute (in addition to the default similarity metrics that are implemented: MSE, CS and SQNR):" - ], - "metadata": { - "id": "6alpyrD8mEm2" - } - }, - { - "cell_type": "code", - "source": [ - "# Define the validation dataset and Xquant configuration, including custom similarity metrics.\n", - "validation_dataset = partial(random_data_gen, use_labels=True)\n", - "xquant_config = XQuantConfig(report_dir='./logs', custom_similarity_metrics={'mae': lambda x, y: float(tf.keras.losses.MAE(x.flatten(), y.flatten()).numpy())})\n", - "\n", - "# Generate the Xquant report comparing the float model and the quantized model using the\n", - "# representative and validation datasets.\n", - "from model_compression_toolkit.xquant import xquant_report_keras_experimental\n", - "result = xquant_report_keras_experimental(\n", - " float_model,\n", - " quantized_model,\n", - " repr_dataset,\n", - " validation_dataset,\n", - " xquant_config\n", - " )" - ], - "metadata": { - "id": "e8m0CNs6mE93" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Visualize in TensorBoard\n", - "\n", - "In the TensorBoard, one can find useful information like statistics of the float layers' outputs and the graph of the quantized model with similarities that were measured comparing to the float model. Currently, the similarity is measured at linear layers like Conv2D, Dense, etc. (may be changed in the future). When observing such node in the graph, the similarities can be found in the node's properties as 'xquant_repr' and 'xquant_val' (the similarity that was computed using the representative dataset and the validation dataset, respectively).\n", - "Make sure to choose 'xquant' from the 'Run' dropdown menu on the left side of TensorBoard.\n", - "\n", - "![tb_keras_xquant.png]()" - ], - "metadata": { - "id": "6QnODkANmU2J" - } - }, - { - "cell_type": "markdown", - "source": [ - "Now we can run TensorBoard:" - ], - "metadata": { - "id": "OAhYMxgzWF1A" - } - }, - { - "cell_type": "code", - "source": [ - "%load_ext tensorboard\n", - "%tensorboard --logdir logs" - ], - "metadata": { - "id": "X6yk2kI6kSEf" - }, - "execution_count": null, - "outputs": [] - } - ] + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Quantization Troubleshooting with the Model Compression Toolkit (MCT) Using the XQuant Feature\n", + "\n", + "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_xquant.ipynb)\n", + "\n", + "## Overview \n", + "This notebook demonstrates the process of generating an Xquant report. The report provides valuable insights regarding the quality and success of the quantization process of a Keras model. This includes histograms and similarity metrics between the original float model and the quantized model in key points of the model. The report can be visualized using TensorBoard.\n", + "\n", + "## Summary\n", + "1. Load a pre-trained MobileNetV2 model and perform post-training quantization.\n", + "5. Define an Xquant configuration.\n", + "6. Generate an Xquant report to compare the float and quantized models.\n", + "7. Visualize the report using TensorBoard.\n", + "\n", + "## Setup" + ], + "metadata": { + "id": "ag0MtvPUkc8i" + } + }, + { + "cell_type": "code", + "source": [ + "TF_VER = '2.14.0'\n", + "\n", + "!pip install -q tensorflow[and-cuda]~={TF_VER}" + ], + "metadata": { + "id": "kCLHJUhTlPDi" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "source": [ + "import model_compression_toolkit as mct\n", + "import numpy as np\n", + "from functools import partial\n", + "from model_compression_toolkit.xquant import XQuantConfig\n", + "import tensorflow as tf" + ], + "metadata": { + "id": "NKKHNppSllmU" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Define a Random Data Generator\n", + "For demonstration purposes, we will use a random dataset generator to create both the representative dataset and the validation dataset. This will allow us to simulate data for quantization and validation without using an actual dataset." + ], + "metadata": { + "id": "-4kQtkZLlnJj" + } + }, + { + "cell_type": "code", + "source": [ + "# Function to generate random data. If use_labels is True, it yields data with labels;\n", + "# otherwise, it yields only data.\n", + "def random_data_gen(shape=(224, 224, 3), use_labels=False, batch_size=2, num_iter=1):\n", + " if use_labels:\n", + " for _ in range(num_iter):\n", + " yield [[np.random.randn(batch_size, *shape)], np.random.randn(batch_size)]\n", + " else:\n", + " for _ in range(num_iter):\n", + " yield [np.random.randn(batch_size, *shape)]" + ], + "metadata": { + "id": "-xM1K6tVlna8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## MobileNetV2 Quantization using MCT\n", + "\n", + "We will begin by quantizing MobileNetV2 using the `keras_post_training_quantization` function from MCT:" + ], + "metadata": { + "id": "naWFGx_vl6tX" + } + }, + { + "cell_type": "code", + "source": [ + "# Load the pre-trained MobileNetV2 model and perform post-training quantization using\n", + "# the representative dataset generated by random_data_gen.\n", + "from keras.applications.mobilenet_v2 import MobileNetV2\n", + "float_model = MobileNetV2()\n", + "quantized_model, _ = mct.ptq.keras_post_training_quantization(in_model=float_model,\n", + " representative_data_gen=random_data_gen)" + ], + "metadata": { + "id": "RlAuiXAzl7Ef" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Generating an XQuant Report\n", + "\n", + "We will start by creating an XQuantConfig object, specifying the directory for logs and adding custom similarity metrics to be computed between key points of the model. In this example, we use the ./logs directory for saving the generated logs and include the MAE (Mean Absolute Error) similarity metric, in addition to the default metrics: MSE (Mean Square Error), CS (Cosine Similarity), and SQNR (Signal-to-Quantization-Noise Ratio)." + ], + "metadata": { + "id": "6alpyrD8mEm2" + } + }, + { + "cell_type": "code", + "source": [ + "# Define the validation dataset and Xquant configuration, including custom similarity metrics.\n", + "validation_dataset = partial(random_data_gen, use_labels=True)\n", + "xquant_config = XQuantConfig(report_dir='./logs', custom_similarity_metrics={'mae': lambda x, y: float(tf.keras.losses.MAE(x.flatten(), y.flatten()).numpy())})\n", + "\n", + "# Generate the Xquant report comparing the float model and the quantized model using the\n", + "# representative and validation datasets.\n", + "from model_compression_toolkit.xquant import xquant_report_keras_experimental\n", + "result = xquant_report_keras_experimental(\n", + " float_model,\n", + " quantized_model,\n", + " random_data_gen,\n", + " validation_dataset,\n", + " xquant_config\n", + " )" + ], + "metadata": { + "id": "e8m0CNs6mE93" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Visualization using TensorBoard\n", + "\n", + "In the TensorBoard, one can find useful information like statistics of the float layers' outputs and the graph of the quantized model with similarities that were measured comparing to the float model. Currently, the similarity is measured at linear layers like Conv2D, Dense, etc. (may be changed in the future). When observing such node in the graph, the similarities can be found in the node's properties as 'xquant_repr' and 'xquant_val' (the similarity that was computed using the representative dataset and the validation dataset, respectively).\n", + "Make sure to choose 'xquant' from the 'Run' dropdown menu on the left side of TensorBoard.\n", + "\n", + "![tb_keras_xquant.png]()" + ], + "metadata": { + "id": "6QnODkANmU2J" + } + }, + { + "cell_type": "markdown", + "source": [ + "Now we can run TensorBoard:" + ], + "metadata": { + "id": "OAhYMxgzWF1A" + } + }, + { + "cell_type": "code", + "source": [ + "%load_ext tensorboard\n", + "%tensorboard --logdir logs" + ], + "metadata": { + "id": "X6yk2kI6kSEf" + }, + "execution_count": null, + "outputs": [] + } + ] } diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_yolov8n.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_yolov8n.ipynb deleted file mode 100644 index c5232e7f2..000000000 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_yolov8n.ipynb +++ /dev/null @@ -1,400 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "4c261298-309f-41e8-9338-a5e205f09b05", - "metadata": {}, - "source": [ - "# Post Training Quantization a YoloV8-nano Object Detection Model\n", - "\n", - "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_yolov8n.ipynb)\n", - "\n", - "## Overview\n", - "\n", - "\n", - "In this tutorial, we'll demonstrate the post-training quantization using MCT for a pre-trained object detection model in Keras. Specifically, we'll integrate post-processing, including the non-maximum suppression (NMS) layer, into the model. This integration aligns with the imx500 target platform capabilities.\n", - "\n", - "In this example we will use an existing pre-trained YoloV8-nano model taken from [Ultralytics](https://github.com/ultralytics/ultralytics). We will convert the model to a Tensorflow model that includes box decoding and NMS layer. Further, we will quantize the model using MCT post training quantization and evaluate the performance of the floating point model and the quantized model on COCO dataset.\n", - "\n", - "\n", - "## Summary\n", - "\n", - "In this tutorial we will cover:\n", - "\n", - "1. Post-Training Quantization using MCT of Keras object detection model including the post-processing.\n", - "2. Data preparation - loading and preprocessing validation and representative datasets from COCO.\n", - "3. Accuracy evaluation of the floating-point and the quantized models." - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Setup\n", - "Install the relevant packages." - ], - "metadata": { - "collapsed": false - }, - "id": "d74f9c855ec54081" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "TF_VER = '2.14.0'\n", - "\n", - "!pip install -q tensorflow=={TF_VER}\n", - "!pip install -q pycocotools\n", - "!pip install 'huggingface-hub<=0.21.4'\n" - ], - "metadata": { - "collapsed": false - }, - "id": "7c7fa04c9903736f" - }, - { - "cell_type": "markdown", - "source": [ - " Clone a copy of the [MCT](https://github.com/sony/model_optimization) (Model Compression Toolkit) into your current directory. This step ensures that you have access to [mct_model_garden](https://github.com/sony/model_optimization/tree/main/tutorials/mct_model_garden) folder which contains all the necessary utility functions for this tutorial.\n", - " **It's important to note that we use the most up-to-date MCT code available.**" - ], - "metadata": { - "collapsed": false - }, - "id": "57717bc8f59a0d85" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "!git clone https://github.com/sony/model_optimization.git local_mct\n", - "!pip install -r ./local_mct/requirements.txt\n", - "import sys\n", - "sys.path.insert(0,\"./local_mct\")" - ], - "metadata": { - "collapsed": false - }, - "id": "9728247bc20d0600" - }, - { - "cell_type": "markdown", - "source": [ - "Finally, load COCO evaluation set" - ], - "metadata": { - "collapsed": false - }, - "id": "7a1038b9fd98bba2" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "!wget -nc http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n", - "!unzip -q -o annotations_trainval2017.zip -d ./coco\n", - "!echo Done loading annotations\n", - "!wget -nc http://images.cocodataset.org/zips/val2017.zip\n", - "!unzip -q -o val2017.zip -d ./coco\n", - "!echo Done loading val2017 images" - ], - "metadata": { - "collapsed": false - }, - "id": "8bea492d71b4060f" - }, - { - "cell_type": "markdown", - "id": "084c2b8b-3175-4d46-a18a-7c4d8b6fcb38", - "metadata": {}, - "source": [ - "## Floating Point Model\n", - "\n", - "### Load the pre-trained weights of Yolo8-nano\n", - "We begin by loading a pre-trained [YOLOv8n](https://huggingface.co/SSI-DNN/test_keras_yolov8n_640x640) model. This implementation is based on [Ultralytics](https://github.com/ultralytics/ultralytics) and includes a slightly modified version of yolov8 detection-head (mainly the box decoding part) that was adapted for model quantization. For further insights into the model's implementation details, please refer to [mct_model_garden](https://github.com/sony/model_optimization/tree/main/tutorials/mct_model_garden/models_keras/yolov8). " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e8395b28-4732-4d18-b081-5d3bdf508691", - "metadata": {}, - "outputs": [], - "source": [ - "from huggingface_hub import from_pretrained_keras\n", - "\n", - "model = from_pretrained_keras('SSI-DNN/keras_yolov8n_640x640')" - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Generate Yolo8-nano Keras model\n", - "In the following steps, we integrate a post-processing component to this base model, which includes tensorflow [combined_non_max_suppression](https://www.tensorflow.org/api_docs/python/tf/image/combined_non_max_suppression) layer." - ], - "metadata": { - "collapsed": false - }, - "id": "7f148e78b769f1dc" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "from keras.models import Model\n", - "\n", - "# Parameter of Yolov8n\n", - "INPUT_RESOLUTION = 640\n", - "\n", - "# Add Tensorflow NMS layer\n", - "boxes, scores = model.output\n", - "outputs = tf.image.combined_non_max_suppression(\n", - " boxes,\n", - " scores,\n", - " max_output_size_per_class=300,\n", - " max_total_size=300,\n", - " iou_threshold=0.7,\n", - " score_threshold=0.001,\n", - " pad_per_class=False,\n", - " clip_boxes=False\n", - " )\n", - "\n", - "model = Model(model.input, outputs, name='yolov8n')\n", - "\n", - "print('Model is ready for evaluation')" - ], - "metadata": { - "collapsed": false - }, - "id": "698ce1d40f2cdf1f" - }, - { - "cell_type": "markdown", - "id": "3cde2f8e-0642-4374-a1f4-df2775fe7767", - "metadata": {}, - "source": [ - "#### Evaluate the floating point model\n", - "Next, we evaluate the floating point model by using `cocoeval` library alongside additional dataset utilities. We can verify the mAP accuracy aligns with that of the original model. \n", - "Note that we set the \"batch_size\" to 5 and the preprocessing according to [Ultralytics](https://github.com/ultralytics/ultralytics).\n", - "Please ensure that the dataset path has been set correctly before running this code cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "from tutorials.mct_model_garden.evaluation_metrics.coco_evaluation import coco_dataset_generator, CocoEval\n", - "from tutorials.mct_model_garden.models_keras.yolov8.yolov8_preprocess import yolov8_preprocess\n", - "\n", - "EVAL_DATASET_FOLDER = './coco/val2017'\n", - "EVAL_DATASET_ANNOTATION_FILE = './coco/annotations/instances_val2017.json'\n", - "BATCH_SIZE = 5\n", - "\n", - "# Load COCO evaluation set\n", - "val_dataset = coco_dataset_generator(dataset_folder=EVAL_DATASET_FOLDER,\n", - " annotation_file=EVAL_DATASET_ANNOTATION_FILE,\n", - " preprocess=yolov8_preprocess,\n", - " batch_size=BATCH_SIZE)\n", - "\n", - "# Define resizing information to map between the model's output and the original image dimensions\n", - "output_resize = {'shape': (INPUT_RESOLUTION, INPUT_RESOLUTION), 'aspect_ratio_preservation': True}\n", - "\n", - "# Initialize the evaluation metric object\n", - "coco_metric = CocoEval(EVAL_DATASET_ANNOTATION_FILE, output_resize) \n", - "\n", - "# Iterate and the evaluation set\n", - "for batch_idx, (images, targets) in enumerate(val_dataset):\n", - " \n", - " # Run inference on the batch\n", - " outputs = model(images)\n", - "\n", - " # Add the model outputs to metric object (a dictionary of outputs after postprocess: boxes, scores & classes)\n", - " coco_metric.add_batch_detections(outputs, targets)\n", - " if (batch_idx + 1) % 100 == 0:\n", - " print(f'processed {(batch_idx + 1) * BATCH_SIZE} images')\n", - "\n", - "# Print float model mAP results\n", - "print(\"Float model mAP: {:.4f}\".format(coco_metric.result()[0]))" - ], - "metadata": { - "collapsed": false - }, - "id": "56393342-cecf-4f64-b9ca-2f515c765942" - }, - { - "cell_type": "markdown", - "id": "015e760b-6555-45b4-aaf9-500e974c1d86", - "metadata": {}, - "source": [ - "## Quantize Model\n", - "\n", - "### Post training quantization using Model Compression Toolkit \n", - "\n", - "Now, we're all set to use MCT's post-training quantization. To begin, we'll define a representative dataset and proceed with the model quantization. Please note that, for demonstration purposes, we'll use the evaluation dataset as our representative dataset. We'll calibrate the model using 100 representative images, divided into 20 iterations of 'batch_size' images each. \n", - "\n", - "Additionally, to further compress the model's memory footprint, we will employ the mixed-precision quantization technique. This method allows each layer to be quantized with different precision options: 2, 4, and 8 bits, aligning with the imx500 target platform capabilities.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "01e90967-594b-480f-b2e6-45e2c9ce9cee", - "metadata": {}, - "outputs": [], - "source": [ - "import model_compression_toolkit as mct\n", - "from typing import Iterator, Tuple, List\n", - "\n", - "REPRESENTATIVE_DATASET_FOLDER = './coco/val2017/'\n", - "REPRESENTATIVE_DATASET_ANNOTATION_FILE = './coco/annotations/instances_val2017.json'\n", - "n_iters = 20\n", - "\n", - "# Load representative dataset\n", - "representative_dataset = coco_dataset_generator(dataset_folder=REPRESENTATIVE_DATASET_FOLDER,\n", - " annotation_file=REPRESENTATIVE_DATASET_ANNOTATION_FILE,\n", - " preprocess=yolov8_preprocess,\n", - " batch_size=BATCH_SIZE)\n", - "\n", - "# Define representative dataset generator\n", - "def get_representative_dataset(n_iter: int, dataset_loader: Iterator[Tuple]):\n", - " \"\"\"\n", - " This function creates a representative dataset generator.\n", - " Args:\n", - " n_iter: number of iterations for MCT to calibrate on\n", - " Returns:\n", - " A representative dataset generator\n", - " \"\"\" \n", - " def representative_dataset() -> Iterator[List]:\n", - " \"\"\"\n", - " Creates a representative dataset generator from a PyTorch data loader, The generator yields numpy\n", - " arrays of batches of shape: [Batch, H, W ,C].\n", - " Returns:\n", - " A representative dataset generator\n", - " \"\"\"\n", - " ds_iter = iter(dataset_loader)\n", - " for _ in range(n_iter):\n", - " yield [next(ds_iter)[0]]\n", - "\n", - " return representative_dataset\n", - "\n", - "# Get representative dataset generator\n", - "representative_dataset_gen = get_representative_dataset(n_iters, representative_dataset)\n", - "\n", - "# Set IMX500-v1 TPC\n", - "tpc = mct.get_target_platform_capabilities(\"tensorflow\", 'imx500', target_platform_version='v1')\n", - "\n", - "# Specify the necessary configuration for mixed precision quantization. To keep the tutorial brief, we'll use a small set of images and omit the hessian metric for mixed precision calculations. It's important to be aware that this choice may impact the resulting accuracy. \n", - "mp_config = mct.core.MixedPrecisionQuantizationConfig(num_of_images=5, use_hessian_based_scores=False)\n", - "config = mct.core.CoreConfig(mixed_precision_config=mp_config,\n", - " quantization_config=mct.core.QuantizationConfig(shift_negative_activation_correction=True))\n", - "\n", - "# Define target Resource Utilization for mixed precision weights quantization (75% of 'standard' 8bits quantization)\n", - "resource_utilization_data = mct.core.keras_resource_utilization_data(model,\n", - " representative_dataset_gen,\n", - " config,\n", - " target_platform_capabilities=tpc)\n", - "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * 0.75)\n", - "\n", - "# Perform post training quantization\n", - "quant_model, _ = mct.ptq.keras_post_training_quantization(model,\n", - " representative_dataset_gen,\n", - " target_resource_utilization=resource_utilization,\n", - " core_config=config,\n", - " target_platform_capabilities=tpc)\n", - "print('Quantized model is ready')" - ] - }, - { - "cell_type": "markdown", - "id": "4fb6bffc-23d1-4852-8ec5-9007361c8eeb", - "metadata": {}, - "source": [ - "### Evaluate quantized model\n", - "Lastly, we can evaluate the performance of the quantized model. There is a slight decrease in performance that can be further mitigated by either expanding the representative dataset or employing MCT's advanced quantization methods, such as GPTQ (Gradient-Based/Enhanced Post Training Quantization)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8dc7b87c-a9f4-4568-885a-fe009c8f4e8f", - "metadata": {}, - "outputs": [], - "source": [ - "# Re-load COCO evaluation set\n", - "val_dataset = coco_dataset_generator(dataset_folder=EVAL_DATASET_FOLDER,\n", - " annotation_file=EVAL_DATASET_ANNOTATION_FILE,\n", - " preprocess=yolov8_preprocess,\n", - " batch_size=BATCH_SIZE)\n", - "\n", - "# Initialize the evaluation metric object\n", - "coco_metric = CocoEval(EVAL_DATASET_ANNOTATION_FILE, output_resize) \n", - "\n", - "# Iterate and the evaluation set\n", - "for batch_idx, (images, targets) in enumerate(val_dataset):\n", - " # Run inference on the batch\n", - " outputs = quant_model(images)\n", - "\n", - " # Add the model outputs to metric object (a dictionary of outputs after postprocess: boxes, scores & classes)\n", - " coco_metric.add_batch_detections(outputs, targets)\n", - " if (batch_idx + 1) % 100 == 0:\n", - " print(f'processed {(batch_idx + 1) * BATCH_SIZE} images')\n", - "\n", - "# Print quantized model mAP results\n", - "print(\"Quantized model mAP: {:.4f}\".format(coco_metric.result()[0]))" - ] - }, - { - "cell_type": "markdown", - "source": [ - "\\\n", - "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", - "\n", - "Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "you may not use this file except in compliance with the License.\n", - "You may obtain a copy of the License at\n", - "\n", - " http://www.apache.org/licenses/LICENSE-2.0\n", - "\n", - "Unless required by applicable law or agreed to in writing, software\n", - "distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "See the License for the specific language governing permissions and\n", - "limitations under the License." - ], - "metadata": { - "collapsed": false - }, - "id": "99702811c4349d42" - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.7" - }, - "colab": { - "provenance": [] - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb index 31569618b..b4ca8eca4 100644 --- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb @@ -236,7 +236,7 @@ { "cell_type": "markdown", "source": [ - "To enable mixed-precision quantization, we define the desired compression ratio. In this example, we will configure the model to compress the weights to 75% of the size of the 8-bit model's weights. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)." + "To enable mixed-precision quantization, we define the desired compression ratio. In this example, we will configure the model to compress the weights to **75% of the size of the 8-bit model's weights**. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)." ], "metadata": { "collapsed": false @@ -255,8 +255,9 @@ " configuration,\n", " target_platform_capabilities=target_platform_cap)\n", "\n", + "weights_compression_ratio = 0.75 # About 0.75 of the model's weights memory size when quantized with 8 bits.\n", "# Create a ResourceUtilization object \n", - "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * 0.75)" + "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * weights_compression_ratio)" ], "metadata": { "collapsed": false @@ -266,6 +267,7 @@ { "cell_type": "markdown", "source": [ + "## Run Post-Training Quantization with Mixed Precision\n", "Now, we are ready to use MCT to quantize the model." ], "metadata": { @@ -426,11 +428,8 @@ "source": [ "## Conclusion\n", "\n", - "In this tutorial, we demonstrated how to quantize a classification model for MNIST in a hardware-friendly manner using MCT. We observed that a 4x compression ratio was achieved with minimal performance degradation.\n", - "\n", - "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n", - "\n", - "While this was a simple model and task, MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n", + "In this tutorial, we demonstrated how to quantize a classification model using the mixed precision feature with MCT. \n", + "MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n", "\n", "## Copyrights:\n", "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n", diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb index 65b9a61a6..9d2757b55 100644 --- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb @@ -380,7 +380,7 @@ "\n", "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n", "\n", - "While this was a simple model and task, MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n", + "MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n", "\n", "## Copyrights\n", "\n", diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb index 3ed4c4e80..82525cd05 100644 --- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb @@ -8,14 +8,14 @@ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb)\n", "\n", "## Overview\n", - "This tutorial provides a step-by-step guide to training, pruning, and retraining a fully connected neural network model using PyTorch. We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", + "This tutorial provides a step-by-step guide to training, pruning, and finetuning a PyTorch fully connected neural network model using the Model Compression Toolkit (MCT). We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", "\n", "## Summary\n", "In this tutorial, we will cover:\n", "\n", "1. **Training a PyTorch model on MNIST:** We'll begin by constructing a basic fully connected neural network and training it on the MNIST dataset. \n", "2. **Applying structured pruning:** We'll introduce a pruning technique to reduce model size while maintaining performance. \n", - "3. **Retraining the pruned model:** After pruning, we'll retrain the model to recover any lost accuracy. \n", + "3. **Finetuning the pruned model:** After pruning, we'll finetune the model to recover any lost accuracy. \n", "4. **Evaluating the pruned model:** We'll evaluate the pruned model’s performance and compare it to the original model.\n", "\n", "## Setup\n", @@ -304,8 +304,16 @@ { "cell_type": "markdown", "source": [ - "## Pruning the Model\n", - "Next,we'll proceed with pruning our trained model to decrease its size, targeting a 50% reduction in the memory footprint of the model's weights. Given that the model's weights utilize the float32 data type, where each parameter occupies 4 bytes, we calculate the memory requirement by multiplying the total number of parameters by 4." + "## Model Pruning\n", + "We are now ready to perform the actual pruning using MCT’s `pytorch_pruning_experimental` function. The model will be pruned based on the defined resource utilization constraints and the previously generated representative dataset.\n", + "\n", + "Each channel’s importance is measured using the [LFH (Label-Free-Hessian) method](https://arxiv.org/abs/2309.11531), which approximates the Hessian of the loss function with respect to the model’s weights.\n", + "\n", + "For efficiency, we use a single score approximation. Although less precise, it significantly reduces processing time compared to multiple approximations, which offer better accuracy but at the cost of longer runtimes.\n", + "\n", + "MCT’s structured pruning will target the first two dense layers, where output channel reduction can be propagated to subsequent layers by adjusting their input channels accordingly.\n", + "\n", + "The output is a pruned model along with pruning information, including layer-specific pruning masks and scores." ], "metadata": { "collapsed": false @@ -355,8 +363,8 @@ "outputs": [], "source": [ "pruned_model_nparams = display_model_params(pruned_model)\n", - "acc_before_retrain = test(pruned_model, device, test_loader)\n", - "print(f'Pruned model accuracy before retraining {acc_before_retrain}%')" + "acc_before_finetuning = test(pruned_model, device, test_loader)\n", + "print(f'Pruned model accuracy before finetuning {acc_before_finetuning}%')" ], "metadata": { "collapsed": false @@ -366,8 +374,8 @@ { "cell_type": "markdown", "source": [ - "## Retraining the Pruned Model\n", - "After pruning, we often need to retrain the model to recover any lost performance." + "## Finetuning the Pruned Model\n", + "After pruning, we often need to finetune the model to recover any lost performance." ], "metadata": { "collapsed": false @@ -415,7 +423,7 @@ "cell_type": "markdown", "source": [ "## Conclusions\n", - "In this tutorial, we demonstrated the process of training, pruning, and retraining a neural network model using the Model Compression Toolkit (MCT). We began by setting up our environment and loading the dataset, followed by building and training a fully connected neural network. We then introduced the concept of model pruning, specifically targeting the first two dense layers to efficiently reduce the model's memory footprint by 50%. After applying structured pruning, we evaluated the pruned model's performance and concluded the tutorial by fine-tuning the pruned model to recover any lost accuracy due to the pruning process. This tutorial provided a hands-on approach to model optimization through pruning, showcasing the balance between model size, performance, and efficiency.\n", + "In this tutorial, we demonstrated the process of training, pruning, and finetuning a neural network model using MCT. We began by setting up our environment and loading the dataset, followed by building and training a fully connected neural network. We then introduced the concept of model pruning, specifically targeting the first two dense layers to efficiently reduce the model's memory footprint by 50%. After applying structured pruning, we evaluated the pruned model's performance and concluded the tutorial by fine-tuning the pruned model to recover any lost accuracy due to the pruning process. This tutorial provided a hands-on approach to model optimization through pruning, showcasing the balance between model size, performance, and efficiency.\n", "\n", "## Copyrights\n", "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",