diff --git a/example/auto_compression/nlp/README.md b/example/auto_compression/nlp/README.md index c98f1987e..da35eebcc 100644 --- a/example/auto_compression/nlp/README.md +++ b/example/auto_compression/nlp/README.md @@ -56,16 +56,16 @@ #### 3.1 准备环境 - python >= 3.6 -- PaddlePaddle >= 2.4 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim >= 2.4 -- PaddleNLP >= 2.3 +- PaddlePaddle ==2.5 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim ==2.5 +- PaddleNLP ==2.6 安装paddlepaddle: ```shell # CPU -pip install paddlepaddle==2.4.1 +pip install paddlepaddle==2.5.0 # GPU 以Ubuntu、CUDA 11.2为例 -python -m pip install paddlepaddle-gpu==2.4.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html +python -m pip install paddlepaddle-gpu==2.5.0.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` 安装paddleslim: @@ -95,7 +95,6 @@ pip install paddlenlp |:------:|:------:|:------:|:------:|:------:|:-----------:|:------:|:------:| | PP-MiniLM | [afqmc](https://bj.bcebos.com/v1/paddle-slim-models/act/afqmc.tar) | [tnews](https://bj.bcebos.com/v1/paddle-slim-models/act/tnews.tar) | [iflytek](https://bj.bcebos.com/v1/paddle-slim-models/act/iflytek.tar) | [cmnli](https://bj.bcebos.com/v1/paddle-slim-models/act/cmnli.tar) | [ ocnli](https://bj.bcebos.com/v1/paddle-slim-models/act/ocnli.tar) | [cluewsc2020](https://bj.bcebos.com/v1/paddle-slim-models/act/cluewsc.tar) | [csl](https://bj.bcebos.com/v1/paddle-slim-models/act/csl.tar) | | ERNIE 3.0-Medium | [afqmc](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/AFQMC.tar) | [tnews](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/TNEWS.tar) | [iflytek](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/IFLYTEK.tar) | [cmnli](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/CMNLI.tar) | [ocnli](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/OCNLI.tar) | [cluewsc2020](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/CLUEWSC2020.tar) | [csl](https://bj.bcebos.com/v1/paddle-slim-models/act/NLP/ernie3.0-medium/fp32_models/CSL.tar) | -| UIE-base | [报销工单](https://bj.bcebos.com/v1/paddle-slim-models/act/uie_base.tar) | 从上表获得模型超链接, 并用以下命令下载推理模型文件: @@ -119,11 +118,6 @@ export CUDA_VISIBLE_DEVICES=0 python run.py --config_path='./configs/pp-minilm/auto/afqmc.yaml' --save_dir='./save_afqmc_pruned/' ``` -自动压缩UIE系列模型需要使用 run_uie.py 脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中训练部分的参数,将任务名称、模型类型、数据集名称、压缩参数传入,配置完成后便可对模型进行蒸馏量化训练。 -```shell -export CUDA_VISIBLE_DEVICES=0 -python run_uie.py --config_path='./configs/uie/uie_base.yaml' --save_dir='./save_uie_qat/' -``` 如仅需验证模型精度,或验证压缩之后模型精度,在启动```run.py```脚本时,将配置文件中模型文件夹 ```model_dir``` 改为压缩之后保存的文件夹路径 ```./save_afqmc_pruned``` ,命令加上```--eval True```即可: ```shell @@ -217,8 +211,6 @@ QuantPost: - TensorRT预测: -环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) - 首先下载量化好的模型: ```shell wget https://bj.bcebos.com/v1/paddle-slim-models/act/save_ppminilm_afqmc_new_calib.tar @@ -227,10 +219,30 @@ tar -xf save_ppminilm_afqmc_new_calib.tar ```shell python paddle_inference_eval.py \ - --model_path=save_ernie3_afqmc_new_cablib \ + --model_path=save_ppminilm_afqmc_new_calib \ + --model_filename=inference.pdmodel \ + --params_filename=inference.pdiparams \ + --task_name='afqmc' \ + --use_trt \ + --precision=int8 +``` + +- ERNIE 3.0-Medium: +```shell +python paddle_inference_eval.py \ + --model_path=TNEWS \ --model_filename=infer.pdmodel \ --params_filename=infer.pdiparams \ - --task_name='afqmc' \ + --task_name='tnews' \ + --use_trt \ + --precision=fp32 +``` +```shell +python paddle_inference_eval.py \ + --model_path=save_tnews_pruned \ + --model_filename=infer.pdmodel \ + --params_filename=infer.pdiparams \ + --task_name='tnews' \ --use_trt \ --precision=int8 ``` @@ -239,9 +251,9 @@ python paddle_inference_eval.py \ ```shell python paddle_inference_eval.py \ - --model_path=save_ernie3_afqmc_new_cablib \ - --model_filename=infer.pdmodel \ - --params_filename=infer.pdiparams \ + --model_path=save_ppminilm_afqmc_new_calib \ + --model_filename=inference.pdmodel \ + --params_filename=inference.pdiparams \ --task_name='afqmc' \ --device=cpu \ --use_mkldnn=True \ diff --git a/example/auto_compression/nlp/configs/ernie3.0/tnews.yaml b/example/auto_compression/nlp/configs/ernie3.0/tnews.yaml index 49093ab87..b90da628a 100644 --- a/example/auto_compression/nlp/configs/ernie3.0/tnews.yaml +++ b/example/auto_compression/nlp/configs/ernie3.0/tnews.yaml @@ -6,12 +6,17 @@ Global: dataset: clue batch_size: 16 max_seq_length: 128 -TrainConfig: - epochs: 6 - eval_iter: 1110 - learning_rate: 2.0e-5 - optimizer_builder: - optimizer: - type: AdamW - weight_decay: 0.01 - origin_metric: 0.5700 + +# 剪枝 +Prune: + prune_algo: transformer_pruner + pruned_ratio: 0.25 + +# 离线量化 +QuantPost: + activation_bits: 8 + quantize_op_types: + - depthwise_conv2d + - conv2d + weight_bits: 8 + \ No newline at end of file diff --git a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml index 9c9f58826..fdf65673b 100644 --- a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml +++ b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml @@ -6,17 +6,11 @@ Global: dataset: clue batch_size: 16 max_seq_length: 128 -TransformerPrune: - pruned_ratio: 0.25 -HyperParameterOptimization: -Distillation: + +#离线量化 QuantPost: -TrainConfig: - epochs: 6 - eval_iter: 1070 - learning_rate: 2.0e-5 - optimizer_builder: - optimizer: - type: AdamW - weight_decay: 0.01 - origin_metric: 0.7403 + activation_bits: 8 + quantize_op_types: + - conv2d + - depthwise_conv2d + weight_bits: 8 \ No newline at end of file diff --git a/example/auto_compression/nlp/paddle_inference_eval.py b/example/auto_compression/nlp/paddle_inference_eval.py index f48e20698..073f032e5 100644 --- a/example/auto_compression/nlp/paddle_inference_eval.py +++ b/example/auto_compression/nlp/paddle_inference_eval.py @@ -91,7 +91,8 @@ def parse_args(): "--max_seq_length", default=128, type=int, - help="The maximum total input sequence length after tokenization. Sequences longer " + help= + "The maximum total input sequence length after tokenization. Sequences longer " "than this will be truncated, sequences shorter will be padded.", ) parser.add_argument( "--perf_warmup_steps", @@ -107,7 +108,8 @@ def parse_args(): type=str, default="fp32", choices=["fp32", "fp16", "int8"], - help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.", + help= + "The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.", ) parser.add_argument( "--use_mkldnn", @@ -156,8 +158,7 @@ def _convert_example(example, } elif "target" in example: # wsc text, query, pronoun, query_idx, pronoun_idx = ( - example["text"], - example["target"]["span1_text"], + example["text"], example["target"]["span1_text"], example["target"]["span2_text"], example["target"]["span1_index"], example["target"]["span2_index"], ) @@ -209,6 +210,12 @@ def create_predictor(cls, args): config = paddle.inference.Config( os.path.join(args.model_path, args.model_filename), os.path.join(args.model_path, args.params_filename)) + config.switch_ir_debug(True) + # 适用于ERNIE 3.0-Medium模型 + # config.exp_disable_tensorrt_ops(["elementwise_add"]) + # config.exp_disable_tensorrt_ops(["fused_embedding_eltwise_layernorm"]) + # config.exp_disable_tensorrt_ops(["tmp_3"]) + if args.device == "gpu": # set GPU configs accordingly config.enable_use_gpu(100, 0) @@ -239,8 +246,8 @@ def create_predictor(cls, args): dynamic_shape_file = os.path.join(args.model_path, "dynamic_shape.txt") if os.path.exists(dynamic_shape_file): - config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file, - True) + config.enable_tuned_tensorrt_dynamic_shape( + dynamic_shape_file, True) print("trt set dynamic shape done!") else: config.collect_shape_range_info(dynamic_shape_file) @@ -365,4 +372,4 @@ def main(): if __name__ == "__main__": paddle.set_device("cpu") - main() + main() \ No newline at end of file diff --git a/example/post_training_quantization/pytorch_yolo_series/README.md b/example/post_training_quantization/pytorch_yolo_series/README.md index 63a7d96c1..dde8c7267 100755 --- a/example/post_training_quantization/pytorch_yolo_series/README.md +++ b/example/post_training_quantization/pytorch_yolo_series/README.md @@ -40,8 +40,8 @@ ## 3. 离线量化流程 #### 3.1 准备环境 -- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) -- PaddleSlim > 2.3版本 +- PaddlePaddle ==2.5 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装) +- PaddleSlim == 2.5 - X2Paddle >= 1.3.9 - opencv-python @@ -49,14 +49,16 @@ (1)安装paddlepaddle: ```shell # CPU -pip install paddlepaddle +python -m pip install paddlepaddle==2.5.0 -i https://pypi.tuna.tsinghua.edu.cn/simple # GPU -pip install paddlepaddle-gpu +python -m pip install paddlepaddle-gpu==2.5.0.post116 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html ``` (2)安装paddleslim: +注意,PaddleSlim这里setup.py需要更改 slim_version='2.5' ```shell -pip install paddleslim +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim +python setup.py install ``` #### 3.2 准备数据集 @@ -122,7 +124,7 @@ python eval.py --config_path=./configs/yolov5s_ptq.yaml #### 3.6 提高离线量化精度 ###### 3.6.1 量化分析工具 -本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[AnalysisPTQ.md](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/quant/post_training_quantization.md)。 +本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisPTQ```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。```paddleslim.quant.AnalysisPTQ```详解见[离线量化](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/tutorials/quant/post_training_quantization.md)。 由于YOLOv6离线量化效果较差,以YOLOv6为例,量化分析工具具体使用方法如下: @@ -208,23 +210,24 @@ python fine_tune.py --config_path=./configs/yolov6s_fine_tune.yaml --simulate_ac ## 4.预测部署 预测部署可参考[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series) 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。 -| 参数名 | 含义 | -| model_path | inference模型文件所在路径,该目录下需要有文件model.pdmodel和params.pdiparams两个文件 | +| 参数名 | 含义 | +|:------:|:------:| +| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 | | dataset_dir | 指定COCO数据集的目录,这是存储数据集的根目录 | | image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 | | val_image_dir | COCO数据集中验证图像的目录名,默认为val2017 | | val_anno_path | 指定COCO数据集的注释(annotation)文件路径,这是包含验证集标注信息的JSON文件,默认为annotations/instances_val2017.json | | benchmark | 指定是否运行性能基准测试。如果设置为True,程序将会进行性能测试 | -| device | 使用GPU或者CPU预测,可选CPU/GPU/XPU,默认设置为GPU | -| use_trt | 是否使用TensorRT进行预测| -| use_mkldnn | 是否使用MKL-DNN加速库,注意use_mkldnn与use_gpu同时为True时,将忽略enable_mkldnn,而使用GPU预测| -| use_dynamic_shape | 是否使用动态形状(dynamic_shape)功能 | -| precision | fp32/fp16/int8| +| device | 使用GPU或者CPU预测,可选CPU/GPU/XPU,默认设置为GPU | +| use_trt | 是否使用 TesorRT 预测引擎 | +| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn```与```use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 | +| cpu_threads | CPU预测时,使用CPU线程数量,默认10 | +| precision | 预测精度,包括`fp32/fp16/int8` | | arch | 指定所使用的模型架构的名称,例如YOLOv5 | | img_shape | 指定模型输入的图像尺寸 | +| use_dynamic_shape | 是否使用动态shape,如果使用动态shape,则设置为True,否则设置为False | | batch_size | 指定模型输入的批处理大小 | -| use_mkldnn | 指定是否使用MKLDNN加速(主要针对CPU)| -| cpu_threads | 指定在CPU上使用的线程数 | + 首先,我们拥有的yolov6.onnx,我们需要把ONNX模型转成paddle模型,具体参考使用[X2Paddle迁移推理模型](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/model_convert/convert_with_x2paddle_cn.html#x2paddle) - 安装X2Paddle @@ -242,7 +245,7 @@ python setup.py install ```shell x2paddle --framework=onnx --model=yolov6s.onnx --save_dir=yolov6_model ``` -- TensorRT Python部署 +#### 4.1 TensorRT Python部署 使用[paddle_inference_eval.py](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/auto_compression/pytorch_yolo_series/paddle_inference_eval.py)部署 ```shell python paddle_inference_eval.py --model_path=yolov6_model/inference_model --dataset_dir=datasets/coco --use_trt=True --precision=fp32 --arch=YOLOv6 @@ -251,7 +254,11 @@ python paddle_inference_eval.py --model_path=yolov6_model/inference_model --data ```shell python paddle_inference_eval.py --model_path=yolov6s_ptq_out --dataset_dir==datasets/coco --use_trt=True --precision=int8 --arch=YOLOv6 ``` -- C++部署 +#### 4.2 MKLDNN Python部署 +```shell +python paddle_inference_eval.py --model_path=yolov6_model/inference_model --dataset_dir=/work/GETR-Lite-paddle-new/inference/datasets/coco --device=CPU --use_mkldnn=True --precision=fp32 --arch=YOLOv6 +``` +#### 4.3 C++部署 具体可参考[运行PP-YOLOE-l目标检测模型样例](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/gpu/ppyoloe_crn_l) 将compile.sh中DEMO_NAME修改为yolov6_test,并且将ppyoloe_crn_l.cc修改为yolov6_test.cc,根据环境修改相关配置库 运行bash compile.sh编译样例。 @@ -272,5 +279,6 @@ python paddle_inference_eval.py --model_path=yolov6s_ptq_out --dataset_dir==data ```shell ./build/yolov6_test --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_int8 ``` + ## 5.FAQ - 如果想对模型进行自动压缩,可进入[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series)中进行实验。