Releases: sony/model_optimization
Release 2.2.2
Removed TPCs - Breaking Change
- This patch release removes IMX500 TPCs versions v2 and v3, for both Keras and PyTorch.
- The following features are no longer available out-of-the-box via the provided IMX500 TPC by MCT:
- Quantized model metadata
- Activation 16-bit quantization
- Constant weights quantization
Full Changelog: v2.2.1...v2.2.2
Release 2.2.1
Bug Fixes and Other Changes:
- A necessary modification for YOLOv8 quantization, derived from #1186 .
Release 2.1.1
Bug Fixes and Other Changes:
- A necessary modification for YOLOv8 quantization, derived from #1186 .
Release 2.2.0
What's Changed
- This release includes breaking changes in the Target Platform Capabilities module (TPC). If you use a custom TPC, be sure to review the Breaking changes section.
General changes
-
Quantization enhancements:
-
Improved Hessian information computation runtime: speeds-up GPTQ, HMSE and Mixed Precision with Hessian-based loss.
get_keras_gptq_config
andget_pytorch_gptq_config
functions now allow to gethessian_batch_size
argument to control the size of the batch in Hessian computation for GPTQ.
-
Data Generation Upgrade: Improved Speed, Performance and Coverage.
- Add
SmoothAugmentationImagePipeline
– an image pipeline implementation that includes gaussian smoothing and random cropping and clipping. - Improved performance with float16 support in PyTorch.
- Introduced
ReduceLROnPlateauWithReset
scheduler – a learning rate scheduler which reduce learning rate when a metric has stopped improving and allows resetting the learning rate to the initial value after a specified number of bad epochs.
- Add
-
Shift negative correction for activations:
- Update shift negative for GELU activation operator.
- Enable shift negative correction by default in QuantizationConfig in CoreConfig.
-
-
Introduce new Explainable Quantization (Xquant) tool (experimental):
-
Introduced TPC IMX500.v3 (experimental):
- Support constants quantization. Constants Add, Sub, Mul & Div operators will be quantized to 8 bits Power of Two quantization, per-axis. Axis is chosen per constant according to minimum quantization error.
- IMX500 TPC now supports 16-bit activation quantization for the following operators: Add, Sub, Mul, Concat & Stack.
- Support assigning allowed input precision options to each operator, that is, the precision representation of the input activation tensor of the operator.
- Default TPC remains IMX500.v1.
- For selecting IMX500.v3 in keras:
tpc_v3 = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version="v3")
mct.ptq.keras_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v3)
- For selecting IMX500.v3 in pytorch:
tpc_v3 = mct.get_target_platform_capabilities("pytorch", 'imx500', target_platform_version="v3")
mct.ptq. pytorch_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v3)
-
Introduced BitWidthConfig API:
- Allow manual adjustment of activation bit-widths for specific model layers through a new class under CoreConfig.
- Usage example of manual selection of 16bit activations available at PyTorch object detection YOLOv8n tutorial.
-
Tutorials:
- MCT tutorial notebooks updates:
- Added new tutorials for IMX500:
- instance segmentation YOLOv8n and a pose estimation YOLOv8n quantization in PyTorch, including an optional Gradient-Based PTQ step for optimized performance.
- A torchvision model quantization for IMX500.
- Added new classification models to MCT’s IMX500-Notebooks.
- Added new tutorials for IMX500:
- Added new MCT features tutorials: Xquant tutorial in PyTorch and Keras. In addition, a new tutorial for GPTQ in PyTorch has been added.
- Update PyTorch object detection YOLOv8n tutorial with 16 bits manual configuration.
- MCT tutorial notebooks updates:
Breaking changes
- To configure OpQuantizationConfig in the TPC, an additional arguments has been added:
Signedness
specifies the signedness of the quantization method (signed or unsigned quantization).supported_input_activation_n_bits
sets the number of bits that operator accepts as input.
Bug fixes:
- Fixed a bug in PyTorch model reader of reshape operator #1086.
- Fixed a bug in GPTQ with bias learning for cases that a convolutional layer with None as a bias #1109.
- Fixed an issue with mixed precision where when running only weights/activation compression with mixed precision. If layers with multiple candidates of the other (activation/weights) exist, the search would fail or be incorrect. A new filtering procedure has been added before running mixed precision, to filter out unnecessary candidates #1162.
New Contributors
Welcome @DaniAffCH, @irenaby, @yardeny-sony for their first contribution! PR #1094, PR #1118, PR #1163
Full Changelog: v2.1.0...v2.2.0
Release 2.1.0
What's Changed
General changes:
- Quantization enhancements:
- Improved quantization parameters: Backpropagate the threshold of concatenation layers. This helps to minimize data loss during the quantization of these layer types.
- Improved weights quantization parameters selection: Introduced Hessian-based MSE quantization error method.
- Set
weights_error_method
toQuantizationErrorMethod.HMSE
in QuantizationConfig in CoreConfig - Currently, this feature is only available in GPTQ due to the increased runtime required for Hessian computation.
- Set
- Improved mixed precision: Use normalized MSE as distance metric in mixed precision sensitivity evaluation for non Hessian-based methods.
- Improved mixed precision runtime: Added a validation step to determine whether quantizing the model to a requested target resource utilization requires mixed precision, or it can be achieved by quantizing the model to the maximal bit-width precision available.
- Automatically removed identity layers to improve graph optimizations..
- Introduced TPC IMX500.v2:
- Enabled a new feature: metadata. A metadata is a dictionary that is saved in the model file and object that contains information about the MCT environment (e.g. MCT version, framework version, etc.).
- Quantize unfolded BatchNorm layers.
- Default TPC remains IMX500.v1. For selecting IMX500.v2 use:
tpc_v2 = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version="v2")
mct.ptq.keras_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v2)
Tutorials
MCT tutorial notebooks updates:
- Reorganized the tutorials into separate sections: IMX500 and MCT features.
- Added new tutorials for IMX500: an object detection YOLOv8n quantization in Keras and PyTorch, including an optional Gradient-Based PTQ step for optimized performance.
- Removed the “quick-start” integration tool from MCT.
Breaking changes:
- TF 2.11 is no longer supported.
Bug fixes:
- Fixed a bug in the GPTQ parameters update.
- Fixed a bug in the similarity analyzer when bias correction is used.
- Fixed a bug in logging
tf.image.combined_non_max_suppression
to Tensorboard (#1055).
Release 2.0.0
What's Changed
Major updates:
- Structured pruning for PyTorch models: MCT now employs structured and hardware-aware pruning for PyTorch models in addition to Keras models.
- Additional details can be found here.
- Try our tutorial Structured Pruning of a Fully-Connected PyTorch Model
- API changes - The MCT "experimental" API has been formalized. Refer to the API documentation for details.
- Quantization parameters search is now faster! Enhanced vectorized search for per-channel threshold delivers quicker results.
General changes:
- Tensorflow 2.15 is now supported.
- Model Statistics Collector - Improved robustness to representative datasets:
- The mean estimator has been switched from IIR to standard mean.
- Activations merged histogram is now with a fixed bin width.
- PyTorch Model Export - the exporter API now includes an argument for specifying the ONNX OPSET version. View the updated API here. The default OPSET version was set to version 15.
- Target Platform Capabilities (TPC) - 'torch.squeeze' operation has been added to the TPC’s “no quantization” list.
- Mixed Precision - several modifications for improving mixed precision usability and stability:
- Hessian-based scores are now disabled by default (faster execution).
- A new API for enabling different weighting methods through the MixedPrecisionQuantizationConfiguration (MpDistanceWeighting enum).
- A rare numerical issue in distance metric computation has been fixed.
- MixedPrecisionQuantizationConfig is now initialized by default in CoreConfig.
- Internal representation of constants in a model has been modified, resolving issues #918 and #812 .
- MCT Troubleshooting and FAQ - check out the Quantization Troubleshooting for common pitfalls and some tools to improve quantization accuracy as well as the FAQ for common issues.
- MCT tutorials – additional tutorials have been added to provide insights of MCT features such as z-score threshold tutorial and quantization parameters search tutorial. In addition, a new tutorial demonstrating semantic segmentation model quantization has been added.
Breaking changes:
- API changes
- Removal of “experimental” from PTQ and GPTQ quantization facades
- keras_post_training_quantization_experimental -> keras_post_training_quantization.
- keras_gradient_post_training_quantization_experimental -> keras_gradient_post_training_quantization
- pytorch_post_training_quantization_experimental -> pytorch_post_training_quantization.
- pytorch_gradient_post_training_quantization_experimental -> pytorch_gradient_post_training_quantization
- Renaming QAT methods
- keras_quantization_aware_training_init -> keras_quantization_aware_training_init_experimental.
- keras_quantization_aware_training_finalize -> keras_quantization_aware_training_finalize_experimental.
- Renaming KPI to ResourceUtilization - this include changes to the KPI object class name, kpi_data facade and target_kpi argument to all the facades.
- Renaming Mixed Precision configuration and GPTQ configuration
- MixedPrecisionQuantizationConfigV2 -> MixedPrecisionQuantizationConfig.
- GradientPTQConfigV2 -> GradientPTQConfig.
- Renaming Keras data generation API
- tensorflow_data_generation_experimental -> keras_data_generation_experimental.
- get_tensorflow_data_generation_config -> get_keras_data_generation_config.
- Removal of “experimental” from PTQ and GPTQ quantization facades
- Weights Attributes Quantization – weights quantization is now extended to support additional attributes beside the “kernel”. This is considered an experimental feature and the default behavior is kept unchanged (i.e., only the kernel of linear layers will be quantized with weights quantizer). Enabling this feature for specific attributes requires creating a modified TPC.
weights_per_channel_threshold
was removed from QuantizationConfig (since it is not used anymore, and can be configured in the TPC).- QuantizationMethod.KMEANS has been removed.
- Unused parameter
hessians_n_iter
has been removed from GPTQHessianScoresConfig. FolderImageLoader
has been removed.
Bug fixes:
- Fixed Python 3.9 Windows file permission error in Fakely TFlite exporter - #865
- Fixed MCT fails for “torch.nn.functional.layer_norm - #921
- Fixed MCT fails for "dims" parameter of type List in “torch.permute - #935
New Contributors
Welcome @samuel-wj-chapman for his first contribution! PR #959
Full Changelog: v1.11.0...v2.0.0
Release v1.11.0
What's Changed
Major updates:
-
Structured pruning for Keras models: MCT now employs structured and hardware-aware pruning. This pruning technique is designed to compress models for specific hardware architectures, taking into account the target platform's "Single Instruction, Multiple Data" (SIMD) capabilities.
- Additional details can be found here.
- Run a tutorial on Google Colab!
-
Learned Step Size Quantization (LSQ) implementation for QAT. To understand how to use LSQ, please refer to our API documentation here.
General changes:
- New tutorials were added! Nanodet-Plus, EfficientDet, and more. These tutorials and more can be found here.
- Support for new NN framework versions was added (Tensorflow v2.14 and Pytorch v2.1).
- Hessian scores used as sensitivity importance scores in mixed-precision, GPTQ, and pruning now support Hessian scoring w.r.t model's weights (in addition to previously supported Hessian w.r.t model's activations).
- Added support for external regularization factor in GPTQ. Please refer to the API for Keras and Pytorch usage.
- Custom layers in Keras, previously unsupported, are now skipped during quantization.
Breaking changes:
-
Names of Hessian-related variables and methods have been revised:
- GPTQHessianWeightsConfig Changes:
- The class
GPTQHessianWeightsConfig
is renamedGPTQHessianScoresConfig
. - The parameter
norm_weights
is renamednorm_scores
. - New API can be found here.
- The class
- MixedPrecisionQuantizationConfigV2 Changes:
- The parameter
use_grad_based_weights
is renameduse_hessian_based_scores
. - The parameter
norm_weights
is renamednorm_scores
. - New API can be found here.
- The parameter
- GPTQHessianWeightsConfig Changes:
-
Exporter changes: New QuantizationFormat 'MCTQ' exports models with mct-quantizers modules. Also, TPC should not be passed during export; instead, a QuantizationFormat is passed directly. For more details and updated usage examples, please see here.
-
The output replacement mechanism has been eliminated from the Hessian computation. As a result, models with specific layer outputs, such as argmax, are now incompatible with the Hessian scoring metric in features like GPTQ and mixed-precision. So, Hessian scoring needs to be deactivated when using these features.
Bug fixes:
- Fixed a permission error during TensorFlow model export on Windows systems. #865 by @jgerityneurala.
- Fixed an issue with pickling torch models. [#841].
- Fixed an issue with systems operating with multiple CUDA devices. [#613].
- Fixed the unsupported NMS layer issue in mixed precision scenarios. [#844]
- Fixed an issue with PyTorch reshape substitute. [#799].
- Fixed an issue finalizing graph configuration following mixed-precision operations with mixed TPC. [#820].
- Tackled numeric problems in mixed precision caused by large values in the distance metric. Fixed by setting a threshold in the MP quantization configuration, ensuring that if a distance value exceeds this threshold, the metric is scaled down.
- Fixed an issue with reused TensorFlow SeparableConv2D decomposition concerning their reuse group.
- Fix bug in PyTorch BN folding into ConvTranspose2d with groups>1.
New Contributors
Welcome @jgerityneurala and @edenlum for their first contributions! PR #865, PR #873
Full Changelog: v1.10.0...v1.11.0
Release v1.10.0
What's Changed
Major Updates:
- Data Generation Library: The Data Generation Library has been added to the Model Compression Toolkit (MCT) project. This library allows users to generate synthetic data for compressing their models and enables quantization without requiring user-provided data. Check out an example of quantizing a model using generated data for torchvision's ResNet18 in this notebook.
General Changes:
- TensorFlow and PyTorch Support: Added support for TensorFlow 2.12 and 2.13, as well as PyTorch 2.0.
- Dependency Cleanup: All dependencies on 'tensorflow-model-optimization' have been removed.
- Quick-Start Tutorial: The quick-start tutorial has been updated with additional GPTQ and Mixed Precision (MP) options and minor bug fixes.
- New TPC: Added IMX500 TPC with weights quantized using non-uniform quantization (LookUp-Table).
Breaking Changes:
- Quantizer Identifier: Replaced the "quantizer_type" property with a new "identifier" property for all trainable quantizers. Each quantizer now has a dedicated identifier.
- ** Changes in Look-up Table (LUT) quantizers**: In Keras and PyTorch:
- Class variables names have been modified to align with MCT Quantizers names:
cluster_centers
->lut_values
multiplier_n_bits
->lut_values_bitwidth
lut_values
is now converted from a numpy array to a list before exporting the model.
- Class variables names have been modified to align with MCT Quantizers names:
Added Features:
- Forward-Fold Layers: Added support for forward-folding BatchNorm and DW-Conv with 1x1 kernel layers for improved quantization.
- Zero in LUT grid: LUT now explicitly includes zero in the quantization grid.
Improvements:
- Quick-Start Enhancements: Improved quick-start for running pre-trained models in MCT.
- Notebook Addition: Added a notebook for running pre-trained models in MCT and a notebook for quantizing a model using images generated with the data generation library.
- Mixed Precision Quantization: Mixed precision quantization is now applied using MCT Quantizers infrastructure.
- Configurable Quantizer Classes: Introduced new 'ConfigurableWeightsQuantizer' and `ConfigurableActivationQuantizer' quantizer classes to support mixed precision search, replacing the SelectiveQuantizer mechanism.
- BOPs Computation Fix: Fixed bit operations (BOPs) computation in mixed precision in the BOPs restriction scenario.
Fixed Issues:
- Param Search in SNC: Fixed param search during shift negative correction (SNC) in PyTorch [#771].
- Second Momentum Correction: Fixed second momentum correction when SNC is enabled [#771].
- Irrelevant Warning: Resolved an irrelevant warning related to the Kmeans function when running LUT quantization (no effect on the usability of the quantizer).
New Contributors
- @alexander-sony made their first contribution in #742
Contributors:
@alexander-sony @lior-dikstein @reuvenperetz @ofirgo @elad-c @eladc-git @haihabi @lapid92
Full Changelog: v1.9.0...v1.10.0
Release v1.9.1
Bug Fixes and Other Changes:
- An issue with mct 1.9.0 requirements file that caused the installation of mct-quantizers 1.2.0 was fixed. Now mct-quantizers version was set to 1.1.0 in mct requirements file to avoid this issue.
Release v1.9.0
What's Changed
Major updates
-
MCT Quantizers:
-
The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).
-
For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.
-
The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the
quantization_infrastructure
package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:inferable_infrastructure
components are available via the MCT Quantizers package. To access them usemct_quantizer.<Component>
(after installing the mct-quantizers package on your environment).trainable_infrastructure
components are available via the MCT’s API. To access them usemct.trainable_infrastructure.<Component>
.
-
-
MCT Tutorials:
- The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
- Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
- In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
- This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
-
Exporter API changes:
- Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
- The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
- A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
- The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
-
API Rearrangement:
- We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
- For example, what was previously accessed via
mct.DebugConfig
should now be accessed usingmct.core.DebugConfig
. - The full list of changes is as follows:
- mct.core:
- DebugConfig
- keras_kpi_data, keras_kpi_data_experimental
- pytorch_kpi_data, pytorch_kpi_data_experimental
- FolderImageLoader
- FrameworkInfo, ChannelAxis
- DefaultDict
- network_editor
- quantization_config
- mixed_precision_quantization_config
- QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
- CoreConfig
- KPI
- MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
- mct.qat:
- QATConfig, TrainingMethod
- keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
- pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
- mct.gptq:
- GradientPTQConfig, RoundingType, GradientPTQConfigV2
- keras_gradient_post_training_quantization_experimental
- get_keras_gptq_config
- pytorch_gradient_post_training_quantization_experimental
- get_pytorch_gptq_config
- mct.exporter:
- KerasExportSerializationFormat
- PytorchExportSerializationFormat
- keras_export_model
- pytorch_export_model
- mct.ptq:
- pytorch_post_training_quantization_experimental
- keras_post_training_quantization_experimental
- mct.core:
- Please update your code accordingly to ensure compatibility with the latest version of MCT.
- Also, notice that the old functions
keras_ptq
,keras_ptq_mp
,pytorch_ptq
, andpytorch_ptq_mp
are now deprecated and will be removed in the future. We highly recommend usingkeras_ptq_experimental
,pytorch_ptq_experimental
instead.
-
The
new_experimental_exporter
flag is now set to True by default inkeras_ptq_experimental
,keras_gptq_experimental
,pytorch_ptq_experimental
andpytorch_gptq_experimental
.
This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.
General changes
-
- New symmetric soft rounding quantizers were added to Pytorch GPTQ, and a uniform soft rounding quantizer was added to both Pytorch and Keras GPTQ.
- GPTQ and QAT quantizer names have been modified with distinguishable suffixes (e.g.,
SymmetricSoftRounding
-->SymmetricSoftRoundingGPTQ
). - Trainable variables grouping - all trainable quantizers now hold a mapping of their trainable parameters, connecting each of them to a specific group, to allow a cleaner and simpler training (and training loop implementation).
- Regularization API for trainable quantizers - extracting the regularization from the quantizer to a higher level. Now, each trainable quantizer defines its regularization function (see for example Keras soft quantizer regularization).
- New “DNN Quantization with Attention (DQA)” quantizer for QAT (Pytorch).
-
GPTQ arguments:
- New
GPTQHessianWeightsConfig
class to provide the necessary arguments for computing the Hessian weights for GPTQ. - New
gptq_quantizer_params_override
argument in GPTQ config, to allow parameters override.
- New
-
Moved TPC from Core package into an independent [
target_platform_capabilities
](https://github.com/sony/model_optimization/tree/main/model_compression_toolkit...