Release v1.11.0
What's Changed
Major updates:
-
Structured pruning for Keras models: MCT now employs structured and hardware-aware pruning. This pruning technique is designed to compress models for specific hardware architectures, taking into account the target platform's "Single Instruction, Multiple Data" (SIMD) capabilities.
- Additional details can be found here.
- Run a tutorial on Google Colab!
-
Learned Step Size Quantization (LSQ) implementation for QAT. To understand how to use LSQ, please refer to our API documentation here.
General changes:
- New tutorials were added! Nanodet-Plus, EfficientDet, and more. These tutorials and more can be found here.
- Support for new NN framework versions was added (Tensorflow v2.14 and Pytorch v2.1).
- Hessian scores used as sensitivity importance scores in mixed-precision, GPTQ, and pruning now support Hessian scoring w.r.t model's weights (in addition to previously supported Hessian w.r.t model's activations).
- Added support for external regularization factor in GPTQ. Please refer to the API for Keras and Pytorch usage.
- Custom layers in Keras, previously unsupported, are now skipped during quantization.
Breaking changes:
-
Names of Hessian-related variables and methods have been revised:
- GPTQHessianWeightsConfig Changes:
- The class
GPTQHessianWeightsConfig
is renamedGPTQHessianScoresConfig
. - The parameter
norm_weights
is renamednorm_scores
. - New API can be found here.
- The class
- MixedPrecisionQuantizationConfigV2 Changes:
- The parameter
use_grad_based_weights
is renameduse_hessian_based_scores
. - The parameter
norm_weights
is renamednorm_scores
. - New API can be found here.
- The parameter
- GPTQHessianWeightsConfig Changes:
-
Exporter changes: New QuantizationFormat 'MCTQ' exports models with mct-quantizers modules. Also, TPC should not be passed during export; instead, a QuantizationFormat is passed directly. For more details and updated usage examples, please see here.
-
The output replacement mechanism has been eliminated from the Hessian computation. As a result, models with specific layer outputs, such as argmax, are now incompatible with the Hessian scoring metric in features like GPTQ and mixed-precision. So, Hessian scoring needs to be deactivated when using these features.
Bug fixes:
- Fixed a permission error during TensorFlow model export on Windows systems. #865 by @jgerityneurala.
- Fixed an issue with pickling torch models. [#841].
- Fixed an issue with systems operating with multiple CUDA devices. [#613].
- Fixed the unsupported NMS layer issue in mixed precision scenarios. [#844]
- Fixed an issue with PyTorch reshape substitute. [#799].
- Fixed an issue finalizing graph configuration following mixed-precision operations with mixed TPC. [#820].
- Tackled numeric problems in mixed precision caused by large values in the distance metric. Fixed by setting a threshold in the MP quantization configuration, ensuring that if a distance value exceeds this threshold, the metric is scaled down.
- Fixed an issue with reused TensorFlow SeparableConv2D decomposition concerning their reuse group.
- Fix bug in PyTorch BN folding into ConvTranspose2d with groups>1.
New Contributors
Welcome @jgerityneurala and @edenlum for their first contributions! PR #865, PR #873
Full Changelog: v1.10.0...v1.11.0