Skip to content

Commit

Permalink
[Update] Update docs and tests.
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangqi3 committed Nov 4, 2021
1 parent 3567c06 commit 31faabe
Show file tree
Hide file tree
Showing 15 changed files with 322 additions and 91 deletions.
2 changes: 1 addition & 1 deletion docs/source/algorithm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ PTQ Algorithms

\[ E[ L(x,y,\mathbf{w}) - L(x,y,\mathbf{w}+\Delta \mathbf{w}) ] \approx \Delta \mathbf{w}^T g^{(\mathbf{w})} + \frac12 \Delta \mathbf{w}^T H^{(\mathbf{w})} \Delta \mathbf{w} \approx \Delta \mathbf{w}_1^2 + \Delta \mathbf{w}_2^2 + \Delta \mathbf{w}_1 \Delta \mathbf{w}_2 \]

Hence, it's benifitial to learn a rounding mask for each layer. One well-designed object function is given by the authors:
Hence, it's benificial to learn a rounding mask for each layer. One well-designed object function is given by the authors:

.. raw:: latex html

Expand Down
10 changes: 10 additions & 0 deletions docs/source/example/deploy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Advance instructions for different hardware platforms
======================================================

.. toctree::
:titlesonly:

TensorRT <platforms/tensorrt.rst>
SNPE <platforms/snpe.rst>


70 changes: 8 additions & 62 deletions docs/source/example/index.rst
Original file line number Diff line number Diff line change
@@ -1,65 +1,11 @@
Get Started
==========================
We follow the `PyTorch official example <https://github.com/pytorch/examples/tree/master/imagenet/>`_ to build the example of Model Quantization Benchmark for ImageNet classification task.

Requirements
-------------
============
This tutorial will give details about the whole work-through to do quantization with MQBench, including:

- Install PyTorch following `pytorch.org <http://pytorch.org/>`_
- Install dependencies::
.. toctree::
:maxdepth: 1
:titlesonly:

pip install -r requirements.txt

- Download the ImageNet dataset from `the official website <http://www.image-net.org/>`_

- Then, and move validation images to labeled subfolders, using `the following shell script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh/>`_

- Install TensorRT=7.2.1.6 from `NVIDIA <https://developer.nvidia.com/tensorrt/>`_

Usage
---------

- **Quantization-Aware Training:**

- Training hyper-parameters:

- batch size = 128
- epochs = 1
- lr = 1e-4
- others like weight decay, momentum are kept as default.

- ResNet18 / ResNet50 / MobileNet_v2::

python main.py -a [model_name] --epochs 1 --lr 1e-4 --b 128 --seed 99 --pretrained


- **Deployment**
We provide the example to deploy the quantized model to TensorRT.

1. First export the quantized model to ONNX [tensorrt_deploy_model.onnx] and dump the clip ranges [tensorrt_clip_ranges.json] for activations.::

python main.py -a [model_name] --resume [model_save_path]

2. Second build the TensorRT INT8 engine and evaluate, please make sure [dataset_path] contains subfolder [val]::

python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --clip [tensorrt_clip_ranges.json] --data [dataset_path] --evaluate
3. If you don’t pass in external clip ranges [tensorrt_clip_ranges.json], TenosrRT will do calibration using default algorithm IInt8EntropyCalibrator2 with 100 images. So, please make sure [dataset_path] contains subfolder [cali]::

python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --data [dataset_path] --evaluate

Results
-----------

+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+
| Model | accuracy\@fp32 | accuracy\@int8 |
| | +----------------------------------------+---------------------------------+---------------------------------------+
| | | TensoRT Calibration | MQBench QAT | TensorRT SetRange |
+===================+================================+========================================+=================================+=======================================+
| **ResNet18** | Acc\@1 69.758 Acc\@5 89.078 | Acc\@1 69.612 Acc\@5 88.980 | Acc\@1 69.912 Acc\@5 89.150 | Acc\@1 69.904 Acc\@5 89.182 |
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
| **ResNet50** | Acc\@1 76.130 Acc\@5 92.862 | Acc\@1 76.074 Acc\@5 92.892 | Acc\@1 76.114 Acc\@5 92.946 | Acc\@1 76.320 Acc\@5 93.006 |
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
| **MobileNet_v2** | Acc\@1 71.878 Acc\@5 90.286 | Acc\@1 70.700 Acc\@5 89.708 | Acc\@1 70.826 Acc\@5 89.874 | Acc\@1 70.724 Acc\@5 89.870 |
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
setup
quantization
deploy
53 changes: 53 additions & 0 deletions docs/source/example/platforms/snpe.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
SNPE
=============
Example of QAT and deployment on SNPE.

**Requirements**:

- Install SNPE SDK from `QualComm <https://developer.qualcomm.com/sites/default/files/docs/snpe/setup.html>`_ (Suggest Ubuntu 18.04)

**QAT**:

- Follow the QAT procedures to get a model checkpoint, and suggest learning rate as 5e-5 with cosine scheduler and Adam optimizer for tens of epochs.

**Deployment**:

- Convert PyTorch checkpoint to `snpe_deploy.onnx` and dump clip ranges to `snpe_clip_ranges.json`::

from mqbench.convert_deploy import convert_deploy
input_dict = {'x': [1, 3, 224, 224]}
convert_deploy(solver.model.module, BackendType.SNPE, input_dict)

- Convert `.onnx` file to `.dlc` format (supported by SNPE)::

snpe-onnx-to-dlc --input_network ./snpe_deploy.onnx --output_path ./snpe_deploy.dlc --quantization_overrides ./snpe_clip_ranges.json

- Note that, the `.json` file contains activation ranges for quantization, but it's required here although the model hasn't been quantized now.

- Quantize the model with parameters overridden::

snpe-dlc-quantize --input_dlc ./snpe_deploy.dlc --input_list ./data.txt --override_params --bias_bitwidth 32

- The `data.txt` records paths to image data for calibration (not important since we will override parameters) which will be loaded by `numpy.fromfile(dtype=np.float32)` and have shape of `(224, 224, 3)`. And this file is required for test.

- Now we get the final model `snpe_deploy_quantized.dlc`

**Results**:

The test is done by SNPE SDK tools, with the quantized model and a text file recording paths to test data in shape of (224, 224, 3)::

snpe-net-run --container ./snpe_deploy_quantized.dlc --input_list ./test_data.txt

The results on several tested models:

+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+
| Model | accuracy\@fp32 | accuracy\@int8 |
| | +-------------------------------------------------------+----------------------------------------------------------+
| | | MQBench QAT | SNPE |
+===================+================================+=======================================================+==========================================================+
| **ResNet18** | 70.65% | 70.75% | 70.74% |
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+
| **ResNet50** | 77.94% | 77.75% | 77.92% |
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+
| **MobileNet_v2** | 72.67% | 72.31% | 72.65% |
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+
56 changes: 56 additions & 0 deletions docs/source/example/platforms/tensorrt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
TensorRT
================
Example of QAT and deployment on TensorRT.

**Requirements**:

- Install TensorRT=7.2.1.6 from `NVIDIA <https://developer.nvidia.com/tensorrt/>`_

**QAT**:

- Training hyper-parameters:

- batch size = 128
- epochs = 1
- learning rate = 1e-4 (for ResNet series) / 1e-5 (for MobileNet series)
- weight decay = 1e-4 (for ResNet series) / 0 (for MobileNet series)
- optimizer: SGD (for ResNet series) / Adam (for MobileNet series)
- others like momentum are kept as default.

- [model_name] = ResNet18 / ResNet50 / MobileNet_v2 / ... ::

git clone https://github.com/TheGreatCold/MQBench.git
cd application/imagenet_example
python main.py -a [model_name] --epochs 1 --lr 1e-4 --b 128 --pretrained


**Deployment**:

We provide the example to deploy the quantized model to TensorRT.

- First export the quantized model to ONNX [tensorrt_deploy_model.onnx] and dump the clip ranges [tensorrt_clip_ranges.json] for activations. ::

python main.py -a [model_name] --resume [model_save_path]

- Second build the TensorRT INT8 engine and evaluate, please make sure [dataset_path] contains subfolder [val]. ::

python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --clip [tensorrt_clip_ranges.json] --data [dataset_path] --evaluate

- If you don’t pass in external clip ranges [tensorrt_clip_ranges.json], TenosrRT will do calibration using default algorithm IInt8EntropyCalibrator2 with 100 images. So, please make sure [dataset_path] contains subfolder [cali]. ::

python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --data [dataset_path] --evaluate

**Results**:


+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+
| Model | accuracy\@fp32 | accuracy\@int8 |
| | +----------------------------------------+---------------------------------+---------------------------------------+
| | | TensoRT Calibration | MQBench QAT | TensorRT SetRange |
+===================+================================+========================================+=================================+=======================================+
| **ResNet18** | Acc\@1 69.758 Acc\@5 89.078 | Acc\@1 69.612 Acc\@5 88.980 | Acc\@1 69.912 Acc\@5 89.150 | Acc\@1 69.904 Acc\@5 89.182 |
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
| **ResNet50** | Acc\@1 76.130 Acc\@5 92.862 | Acc\@1 76.074 Acc\@5 92.892 | Acc\@1 76.114 Acc\@5 92.946 | Acc\@1 76.320 Acc\@5 93.006 |
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
| **MobileNet_v2** | Acc\@1 71.878 Acc\@5 90.286 | Acc\@1 70.700 Acc\@5 89.708 | Acc\@1 71.158 Acc\@5 89.990 | Acc\@1 71.102 Acc\@5 89.932 |
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
54 changes: 54 additions & 0 deletions docs/source/example/quantization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
How to do quantization with MQBench
======================================

QAT (Quantization-Aware Training)
---------------------------------------------
The training only requires some additional operations compared to ordinary fine-tune.

::

import torchvision.models as models
from mqbench.convert_deploy import convert_deploy
from mqbench.prepare_by_platform import prepare_qat_fx_by_platform, BackendType
from mqbench.utils.state import enable_calibration, enable_quantization

# first, initialize the FP32 model with pretrained parameters.
model = models.__dict__["resnet18"](pretrained=True)
model.train()

# then, we will trace the original model using torch.fx and \
# insert fake quantize nodes according to different hardware backends (e.g. TensorRT).
model = prepare_qat_fx_by_platform(model, BackendType.Tensorrt)

# before training, we recommend to enable observers for calibration in several batches, and then enable quantization.
model.eval()
enable_calibration(model)
calibration_flag = True

# training loop
for i, batch in enumerate(data):
# do forward procedures
...

if calibration_flag:
if i >= 0:
calibration_flag = False
model.zero_grad()
model.train()
enable_quantization(model)
else:
continue

# do backward and optimization
...

# deploy model, remove fake quantize nodes and dump quantization params like clip ranges.
convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]})


PTQ (Post-Training Quantization)
---------------------------------------------
To be finished.



21 changes: 21 additions & 0 deletions docs/source/example/setup.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Preparations
===================================
Generally, we follow the `PyTorch official example <https://github.com/pytorch/examples/tree/master/imagenet/>`_ to build the example of Model Quantization Benchmark for ImageNet classification task.


- Install PyTorch following `pytorch.org <http://pytorch.org/>`_
- Install dependencies ::

pip install -r requirements.txt

- Specific requirements for hardware platforms will be introduced later

- Download the ImageNet dataset from `the official website <http://www.image-net.org/>`_

- Then, and move validation images to labeled subfolders, using `the following shell script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh/>`_

- Or process other datasets in the similar way.

- Full precision pretrained models are preferred, but sometimes it's possible to do QAT from scratch.


1 change: 1 addition & 0 deletions docs/source/hardware/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ Quantization Hardware

nnie
tensorrt
snpe
29 changes: 29 additions & 0 deletions docs/source/hardware/snpe.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
SNPE
=========

`Snapdragon Neural Processing Engine (SNPE) <https://developer.qualcomm.com/sites/default/files/docs/snpe//index.html/>`_ is a Qualcomm Snapdragon software accelerated runtime for the execution of deep neural networks.

.. _SNPE Quantization Scheme:

Quantization Scheme
--------------------
8/16 bit per-layer asymmetric linear quantization.

.. math::
\begin{equation}
q = \mathtt{clamp}\left(\left\lfloor R * \dfrac{x - cmin}{cmax - cmin} \right\rceil, lb, ub\right)
\end{equation}
where :math:`R` is the integer range after quantization, :math:`cmax` and :math:`cmin` are calculated range of the floating values, :math:`lb` and :math:`ub` are bounds of integer range.
Taking 8bit as an example, R=255, [lb, ub]=[0,255].


In fact, when building the SNPE with the official tools, it will firstly convert the model into *.dlc* model file of full precision, and then optionally change it into a quantized version.

.. attention::
Users can provide a .json file to override the parameters.

The values of *scale* and *offset* are not required, but can be overrided.

SNPE will adjust the values of *cmin* and *cmax* to ensure zero is representable.
Loading

0 comments on commit 31faabe

Please sign in to comment.