-
Notifications
You must be signed in to change notification settings - Fork 140
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
zhangqi3
committed
Nov 4, 2021
1 parent
3567c06
commit 31faabe
Showing
15 changed files
with
322 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Advance instructions for different hardware platforms | ||
====================================================== | ||
|
||
.. toctree:: | ||
:titlesonly: | ||
|
||
TensorRT <platforms/tensorrt.rst> | ||
SNPE <platforms/snpe.rst> | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,65 +1,11 @@ | ||
Get Started | ||
========================== | ||
We follow the `PyTorch official example <https://github.com/pytorch/examples/tree/master/imagenet/>`_ to build the example of Model Quantization Benchmark for ImageNet classification task. | ||
|
||
Requirements | ||
------------- | ||
============ | ||
This tutorial will give details about the whole work-through to do quantization with MQBench, including: | ||
|
||
- Install PyTorch following `pytorch.org <http://pytorch.org/>`_ | ||
- Install dependencies:: | ||
.. toctree:: | ||
:maxdepth: 1 | ||
:titlesonly: | ||
|
||
pip install -r requirements.txt | ||
|
||
- Download the ImageNet dataset from `the official website <http://www.image-net.org/>`_ | ||
|
||
- Then, and move validation images to labeled subfolders, using `the following shell script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh/>`_ | ||
|
||
- Install TensorRT=7.2.1.6 from `NVIDIA <https://developer.nvidia.com/tensorrt/>`_ | ||
|
||
Usage | ||
--------- | ||
|
||
- **Quantization-Aware Training:** | ||
|
||
- Training hyper-parameters: | ||
|
||
- batch size = 128 | ||
- epochs = 1 | ||
- lr = 1e-4 | ||
- others like weight decay, momentum are kept as default. | ||
|
||
- ResNet18 / ResNet50 / MobileNet_v2:: | ||
|
||
python main.py -a [model_name] --epochs 1 --lr 1e-4 --b 128 --seed 99 --pretrained | ||
|
||
|
||
- **Deployment** | ||
We provide the example to deploy the quantized model to TensorRT. | ||
|
||
1. First export the quantized model to ONNX [tensorrt_deploy_model.onnx] and dump the clip ranges [tensorrt_clip_ranges.json] for activations.:: | ||
|
||
python main.py -a [model_name] --resume [model_save_path] | ||
|
||
2. Second build the TensorRT INT8 engine and evaluate, please make sure [dataset_path] contains subfolder [val]:: | ||
|
||
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --clip [tensorrt_clip_ranges.json] --data [dataset_path] --evaluate | ||
3. If you don’t pass in external clip ranges [tensorrt_clip_ranges.json], TenosrRT will do calibration using default algorithm IInt8EntropyCalibrator2 with 100 images. So, please make sure [dataset_path] contains subfolder [cali]:: | ||
|
||
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --data [dataset_path] --evaluate | ||
|
||
Results | ||
----------- | ||
|
||
+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+ | ||
| Model | accuracy\@fp32 | accuracy\@int8 | | ||
| | +----------------------------------------+---------------------------------+---------------------------------------+ | ||
| | | TensoRT Calibration | MQBench QAT | TensorRT SetRange | | ||
+===================+================================+========================================+=================================+=======================================+ | ||
| **ResNet18** | Acc\@1 69.758 Acc\@5 89.078 | Acc\@1 69.612 Acc\@5 88.980 | Acc\@1 69.912 Acc\@5 89.150 | Acc\@1 69.904 Acc\@5 89.182 | | ||
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+ | ||
| **ResNet50** | Acc\@1 76.130 Acc\@5 92.862 | Acc\@1 76.074 Acc\@5 92.892 | Acc\@1 76.114 Acc\@5 92.946 | Acc\@1 76.320 Acc\@5 93.006 | | ||
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+ | ||
| **MobileNet_v2** | Acc\@1 71.878 Acc\@5 90.286 | Acc\@1 70.700 Acc\@5 89.708 | Acc\@1 70.826 Acc\@5 89.874 | Acc\@1 70.724 Acc\@5 89.870 | | ||
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+ | ||
setup | ||
quantization | ||
deploy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
SNPE | ||
============= | ||
Example of QAT and deployment on SNPE. | ||
|
||
**Requirements**: | ||
|
||
- Install SNPE SDK from `QualComm <https://developer.qualcomm.com/sites/default/files/docs/snpe/setup.html>`_ (Suggest Ubuntu 18.04) | ||
|
||
**QAT**: | ||
|
||
- Follow the QAT procedures to get a model checkpoint, and suggest learning rate as 5e-5 with cosine scheduler and Adam optimizer for tens of epochs. | ||
|
||
**Deployment**: | ||
|
||
- Convert PyTorch checkpoint to `snpe_deploy.onnx` and dump clip ranges to `snpe_clip_ranges.json`:: | ||
|
||
from mqbench.convert_deploy import convert_deploy | ||
input_dict = {'x': [1, 3, 224, 224]} | ||
convert_deploy(solver.model.module, BackendType.SNPE, input_dict) | ||
|
||
- Convert `.onnx` file to `.dlc` format (supported by SNPE):: | ||
|
||
snpe-onnx-to-dlc --input_network ./snpe_deploy.onnx --output_path ./snpe_deploy.dlc --quantization_overrides ./snpe_clip_ranges.json | ||
|
||
- Note that, the `.json` file contains activation ranges for quantization, but it's required here although the model hasn't been quantized now. | ||
|
||
- Quantize the model with parameters overridden:: | ||
|
||
snpe-dlc-quantize --input_dlc ./snpe_deploy.dlc --input_list ./data.txt --override_params --bias_bitwidth 32 | ||
|
||
- The `data.txt` records paths to image data for calibration (not important since we will override parameters) which will be loaded by `numpy.fromfile(dtype=np.float32)` and have shape of `(224, 224, 3)`. And this file is required for test. | ||
|
||
- Now we get the final model `snpe_deploy_quantized.dlc` | ||
|
||
**Results**: | ||
|
||
The test is done by SNPE SDK tools, with the quantized model and a text file recording paths to test data in shape of (224, 224, 3):: | ||
|
||
snpe-net-run --container ./snpe_deploy_quantized.dlc --input_list ./test_data.txt | ||
|
||
The results on several tested models: | ||
|
||
+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+ | ||
| Model | accuracy\@fp32 | accuracy\@int8 | | ||
| | +-------------------------------------------------------+----------------------------------------------------------+ | ||
| | | MQBench QAT | SNPE | | ||
+===================+================================+=======================================================+==========================================================+ | ||
| **ResNet18** | 70.65% | 70.75% | 70.74% | | ||
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+ | ||
| **ResNet50** | 77.94% | 77.75% | 77.92% | | ||
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+ | ||
| **MobileNet_v2** | 72.67% | 72.31% | 72.65% | | ||
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
TensorRT | ||
================ | ||
Example of QAT and deployment on TensorRT. | ||
|
||
**Requirements**: | ||
|
||
- Install TensorRT=7.2.1.6 from `NVIDIA <https://developer.nvidia.com/tensorrt/>`_ | ||
|
||
**QAT**: | ||
|
||
- Training hyper-parameters: | ||
|
||
- batch size = 128 | ||
- epochs = 1 | ||
- learning rate = 1e-4 (for ResNet series) / 1e-5 (for MobileNet series) | ||
- weight decay = 1e-4 (for ResNet series) / 0 (for MobileNet series) | ||
- optimizer: SGD (for ResNet series) / Adam (for MobileNet series) | ||
- others like momentum are kept as default. | ||
|
||
- [model_name] = ResNet18 / ResNet50 / MobileNet_v2 / ... :: | ||
|
||
git clone https://github.com/TheGreatCold/MQBench.git | ||
cd application/imagenet_example | ||
python main.py -a [model_name] --epochs 1 --lr 1e-4 --b 128 --pretrained | ||
|
||
|
||
**Deployment**: | ||
|
||
We provide the example to deploy the quantized model to TensorRT. | ||
|
||
- First export the quantized model to ONNX [tensorrt_deploy_model.onnx] and dump the clip ranges [tensorrt_clip_ranges.json] for activations. :: | ||
|
||
python main.py -a [model_name] --resume [model_save_path] | ||
|
||
- Second build the TensorRT INT8 engine and evaluate, please make sure [dataset_path] contains subfolder [val]. :: | ||
|
||
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --clip [tensorrt_clip_ranges.json] --data [dataset_path] --evaluate | ||
|
||
- If you don’t pass in external clip ranges [tensorrt_clip_ranges.json], TenosrRT will do calibration using default algorithm IInt8EntropyCalibrator2 with 100 images. So, please make sure [dataset_path] contains subfolder [cali]. :: | ||
|
||
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --data [dataset_path] --evaluate | ||
|
||
**Results**: | ||
|
||
|
||
+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+ | ||
| Model | accuracy\@fp32 | accuracy\@int8 | | ||
| | +----------------------------------------+---------------------------------+---------------------------------------+ | ||
| | | TensoRT Calibration | MQBench QAT | TensorRT SetRange | | ||
+===================+================================+========================================+=================================+=======================================+ | ||
| **ResNet18** | Acc\@1 69.758 Acc\@5 89.078 | Acc\@1 69.612 Acc\@5 88.980 | Acc\@1 69.912 Acc\@5 89.150 | Acc\@1 69.904 Acc\@5 89.182 | | ||
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+ | ||
| **ResNet50** | Acc\@1 76.130 Acc\@5 92.862 | Acc\@1 76.074 Acc\@5 92.892 | Acc\@1 76.114 Acc\@5 92.946 | Acc\@1 76.320 Acc\@5 93.006 | | ||
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+ | ||
| **MobileNet_v2** | Acc\@1 71.878 Acc\@5 90.286 | Acc\@1 70.700 Acc\@5 89.708 | Acc\@1 71.158 Acc\@5 89.990 | Acc\@1 71.102 Acc\@5 89.932 | | ||
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
How to do quantization with MQBench | ||
====================================== | ||
|
||
QAT (Quantization-Aware Training) | ||
--------------------------------------------- | ||
The training only requires some additional operations compared to ordinary fine-tune. | ||
|
||
:: | ||
|
||
import torchvision.models as models | ||
from mqbench.convert_deploy import convert_deploy | ||
from mqbench.prepare_by_platform import prepare_qat_fx_by_platform, BackendType | ||
from mqbench.utils.state import enable_calibration, enable_quantization | ||
|
||
# first, initialize the FP32 model with pretrained parameters. | ||
model = models.__dict__["resnet18"](pretrained=True) | ||
model.train() | ||
|
||
# then, we will trace the original model using torch.fx and \ | ||
# insert fake quantize nodes according to different hardware backends (e.g. TensorRT). | ||
model = prepare_qat_fx_by_platform(model, BackendType.Tensorrt) | ||
|
||
# before training, we recommend to enable observers for calibration in several batches, and then enable quantization. | ||
model.eval() | ||
enable_calibration(model) | ||
calibration_flag = True | ||
|
||
# training loop | ||
for i, batch in enumerate(data): | ||
# do forward procedures | ||
... | ||
|
||
if calibration_flag: | ||
if i >= 0: | ||
calibration_flag = False | ||
model.zero_grad() | ||
model.train() | ||
enable_quantization(model) | ||
else: | ||
continue | ||
|
||
# do backward and optimization | ||
... | ||
|
||
# deploy model, remove fake quantize nodes and dump quantization params like clip ranges. | ||
convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]}) | ||
|
||
|
||
PTQ (Post-Training Quantization) | ||
--------------------------------------------- | ||
To be finished. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
Preparations | ||
=================================== | ||
Generally, we follow the `PyTorch official example <https://github.com/pytorch/examples/tree/master/imagenet/>`_ to build the example of Model Quantization Benchmark for ImageNet classification task. | ||
|
||
|
||
- Install PyTorch following `pytorch.org <http://pytorch.org/>`_ | ||
- Install dependencies :: | ||
|
||
pip install -r requirements.txt | ||
|
||
- Specific requirements for hardware platforms will be introduced later | ||
|
||
- Download the ImageNet dataset from `the official website <http://www.image-net.org/>`_ | ||
|
||
- Then, and move validation images to labeled subfolders, using `the following shell script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh/>`_ | ||
|
||
- Or process other datasets in the similar way. | ||
|
||
- Full precision pretrained models are preferred, but sometimes it's possible to do QAT from scratch. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,3 +6,4 @@ Quantization Hardware | |
|
||
nnie | ||
tensorrt | ||
snpe |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
SNPE | ||
========= | ||
|
||
`Snapdragon Neural Processing Engine (SNPE) <https://developer.qualcomm.com/sites/default/files/docs/snpe//index.html/>`_ is a Qualcomm Snapdragon software accelerated runtime for the execution of deep neural networks. | ||
|
||
.. _SNPE Quantization Scheme: | ||
|
||
Quantization Scheme | ||
-------------------- | ||
8/16 bit per-layer asymmetric linear quantization. | ||
|
||
.. math:: | ||
\begin{equation} | ||
q = \mathtt{clamp}\left(\left\lfloor R * \dfrac{x - cmin}{cmax - cmin} \right\rceil, lb, ub\right) | ||
\end{equation} | ||
where :math:`R` is the integer range after quantization, :math:`cmax` and :math:`cmin` are calculated range of the floating values, :math:`lb` and :math:`ub` are bounds of integer range. | ||
Taking 8bit as an example, R=255, [lb, ub]=[0,255]. | ||
|
||
|
||
In fact, when building the SNPE with the official tools, it will firstly convert the model into *.dlc* model file of full precision, and then optionally change it into a quantized version. | ||
|
||
.. attention:: | ||
Users can provide a .json file to override the parameters. | ||
|
||
The values of *scale* and *offset* are not required, but can be overrided. | ||
|
||
SNPE will adjust the values of *cmin* and *cmax* to ensure zero is representable. |
Oops, something went wrong.