Releases: sony/model_optimization
Release v1.9.1
Bug Fixes and Other Changes:
- An issue with mct 1.9.0 requirements file that caused the installation of mct-quantizers 1.2.0 was fixed. Now mct-quantizers version was set to 1.1.0 in mct requirements file to avoid this issue.
Release v1.9.0
What's Changed
Major updates
-
MCT Quantizers:
-
The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).
-
For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.
-
The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the
quantization_infrastructure
package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:inferable_infrastructure
components are available via the MCT Quantizers package. To access them usemct_quantizer.<Component>
(after installing the mct-quantizers package on your environment).trainable_infrastructure
components are available via the MCT’s API. To access them usemct.trainable_infrastructure.<Component>
.
-
-
MCT Tutorials:
- The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
- Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
- In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
- This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
-
Exporter API changes:
- Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
- The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
- A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
- The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
-
API Rearrangement:
- We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
- For example, what was previously accessed via
mct.DebugConfig
should now be accessed usingmct.core.DebugConfig
. - The full list of changes is as follows:
- mct.core:
- DebugConfig
- keras_kpi_data, keras_kpi_data_experimental
- pytorch_kpi_data, pytorch_kpi_data_experimental
- FolderImageLoader
- FrameworkInfo, ChannelAxis
- DefaultDict
- network_editor
- quantization_config
- mixed_precision_quantization_config
- QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
- CoreConfig
- KPI
- MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
- mct.qat:
- QATConfig, TrainingMethod
- keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
- pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
- mct.gptq:
- GradientPTQConfig, RoundingType, GradientPTQConfigV2
- keras_gradient_post_training_quantization_experimental
- get_keras_gptq_config
- pytorch_gradient_post_training_quantization_experimental
- get_pytorch_gptq_config
- mct.exporter:
- KerasExportSerializationFormat
- PytorchExportSerializationFormat
- keras_export_model
- pytorch_export_model
- mct.ptq:
- pytorch_post_training_quantization_experimental
- keras_post_training_quantization_experimental
- mct.core:
- Please update your code accordingly to ensure compatibility with the latest version of MCT.
- Also, notice that the old functions
keras_ptq
,keras_ptq_mp
,pytorch_ptq
, andpytorch_ptq_mp
are now deprecated and will be removed in the future. We highly recommend usingkeras_ptq_experimental
,pytorch_ptq_experimental
instead.
-
The
new_experimental_exporter
flag is now set to True by default inkeras_ptq_experimental
,keras_gptq_experimental
,pytorch_ptq_experimental
andpytorch_gptq_experimental
.
This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.
General changes
-
- New symmetric soft rounding quantizers were added to Pytorch GPTQ, and a uniform soft rounding quantizer was added to both Pytorch and Keras GPTQ.
- GPTQ and QAT quantizer names have been modified with distinguishable suffixes (e.g.,
SymmetricSoftRounding
-->SymmetricSoftRoundingGPTQ
). - Trainable variables grouping - all trainable quantizers now hold a mapping of their trainable parameters, connecting each of them to a specific group, to allow a cleaner and simpler training (and training loop implementation).
- Regularization API for trainable quantizers - extracting the regularization from the quantizer to a higher level. Now, each trainable quantizer defines its regularization function (see for example Keras soft quantizer regularization).
- New “DNN Quantization with Attention (DQA)” quantizer for QAT (Pytorch).
-
GPTQ arguments:
- New
GPTQHessianWeightsConfig
class to provide the necessary arguments for computing the Hessian weights for GPTQ. - New
gptq_quantizer_params_override
argument in GPTQ config, to allow parameters override.
- New
-
Moved TPC from Core package into an independent [
target_platform_capabilities
](https://github.com/sony/model_optimization/tree/main/model_compression_toolkit...
Release v1.8.0
What's Changed
Major updates:
-
Quantization Aware Training is now supported in PyTorch (experimental):
- Training model: QAT training
- Finalize model (export): QAT finalize
- Explanation about our Pytorch QAT quantizers can be found here: Quantizers
-
New method for exporting quantized models (experimental):
- You can now get INT8 TFLite models using MCT. Exporting fakely-quantized models is still supported.
Please see Exporter for more information and usage examples.
- You can now get INT8 TFLite models using MCT. Exporting fakely-quantized models is still supported.
General changes:
- Add Quantization Infrastructure (QI)
- The new infrastructure will make it easy for developing new quantizers:
- Support multi-training algorithms.
- Support Keras and Pytorch frameworks.
- Quantizers are divided to two types:
- Trainable Quantizers: quantizers for training
- Inferable Quantizers: quantizers for inference (deployment)
- Currently only Quantization Aware Training (QAT) is supported in the new infrastructure.
for more information see: Quantization Infrastructure
- The new infrastructure will make it easy for developing new quantizers:
- Support TensorFlow v2.11.
- Support NUMPY v1.24: fix depreciated dtypes.
- Add IMX500 to TPC. Quantization methods are Symmetric Weights and Power-Of-2 activations. For getting the IMX500 TPC please use Get TargetPlatformCapabilities.
- Add Symmetric LUT quantization.
- Remove Gumbel-Rounding from GPTQ.
- Add Keras implementation of Soft-Rounding to GPTQ. Soft-Rounding is enabled by default. To change it please edit GPTQ Config.
Bug fixes:
- Remove unnecessary assert in Activation layer of type float64.
- Fix bugs and speed up gradients computation.
- Close issues:
Contributors
Full Changelog: v1.7.1...v1.8.0
Release v1.7.1
What's Changed
Bug fixes:
- Added outlier removal using Z threshold to Shift Negative Correction.
- Fixed mixed-precision issue for Pytorch models with multiple inputs.
- Fixed wrong KPI computation in mixed-precision when the model has reused layers.
- Fixed import error in statistics correction package. #470
Full Changelog: v1.7.0...v1.7.1
Release v1.7.0
What's Changed
Major updates:
- Changed API for new dataset iterator for MCT's representative dataset used for calibration iterations.
- For the representative dataset, you can now use a generator or an iterator class (or any Callable that implements _iter_ and _next_ methods).
- This affects the following facade methods:
- Argument n_iter from CoreConfig was removed, as it should be included in the representative dataset Callable
- A Change in get_keras_gptq_config and get_pytorch_gptq_config: n_iter was replaced with n_epochs. Set n_epochs to the times the representative dataset is iterated in the GPTQ process. Notice that now, in each GPTQ epoch, the representative dataset is iterated, unlike the previous behavior where a single batch from the representative dataset was used per GPTQ iteration.
- Changes in keras_gradient_post_training_quantization_experimental and pytorch_gradient_post_training_quantization_experimental
can receive a different dataset generator from the generator used for PTQ calibration and mixed-precision bit-width configuration search. If it is not passed while using GPTQ, the calibration dataset will be used for fine-tuning.
- Notice that the old API was not changed. n_iter can be used to set the number of iterations for the GPTQ process, and FolderImageLoader is still supported, and can be used for images loading, using the old API. However, using the new API is recommended.
General changes:
-
GPTQ changes:
- Added new experimental GPTQ configuration class named GradientPTQV2.
- Added support for uniform quantizers during GPTQ for PyTorch models. For using uniform quantizers, documentation can be found here.. Experimental.
- Added usage of tf.function in GPTQ for fine-tuning TensorFlow models to improve GPTQ runtime.
-
Mixed-Precision changes:
- Added new refinement procedure for improving mixed-precision bit-width configuration and enabled by default. It can be disabled using refine_mp_solution in MixedPrecisionQuantizationConfigV2.
- Improved mixed-precision configuration search runtime by refactoring the computation of distance matrix computation and similarity analysis functions.
-
Added second-moment correction (in addition to bias correction) for PyTorch and TensorFlow models.
Can be enabled by setting weights_second_moment_correction to True when creating a QuantizationConfig. (Experimental). -
Added a search for improved shift and threshold in the shift negative correction algorithm. Can be enabled by setting shift_negative_params_search to True when creating a QuantizationConfig). (Experimental)
-
Added support for TensorFlow v2.10, as well as PyTorch v1.13, v1.12
-
Tested using multiple Python versions: 3.10, 3.9, 3.8, 3.7
-
Removed LayerNormDecomposition substitution for TensorFlow models.
-
Added support for PyTorch convolution functional API.
-
Updated requirements file (excluding networkx v2.8.1). See requierments here.
-
Added new tutorials. Find all MCT's tutorials here.
-
Added a new look to our website! Check it out!
Bug fixes:
- Fixed small thresholds issue due to numerical issues. Changed calculation of PoT thresholds to be of type float64.
Contributors
New Contributors
- @tehiladaboush made their first contribution in #399 👏
Full Changelog: v1.6.0...v1.7.0
Release v1.6.0
What's Changed
Major updates:
-
Added Keras Quantization-Aware-Training (QAT) support (experimental): 🥳
- Added new functions to prepare a Keras model for QAT and finalize a model after it was retrained.
- You can find a tutorial for using QAT here.
- Run this tutorial in Google Colab!
- The API can be found here and here.
-
Added Gumbel-Rounding quantizer to Keras Gradient-Based PTQ (GPTQ) (experimental): 🎉
- A new quantizer can be used during GPTQ training and configured using GradientPTQConfig.
- Use get_keras_gptq_config documentation to easily create a GradientPTQConfig and start training using keras_gradient_post_training_quantization_experimental.
-
Added initial support for GPTQ for PyTorch models (experimental). Please visit the GradientPTQConfig documentation and get_pytorch_gptq_config documentation for more details.
General changes:
- Added support for LUT Kmean quantizer for activations for Keras and PyTorch models.
- GPTQ changes:
- Added support for weighted loss in Keras GPTQ.
- Default values in GradientPTQConfig were re-set.
- API of get_keras_gptq_config was changed.
- Please visit the GradientPTQConfig documentation and get_keras_gptq_config documentation for more details.
- MixedPrecisionQuantizationConfigV2 default values were changed. Please visit the MixedPrecisionQuantizationConfigV2 documentation for more details.
- Added support for buffers in PyTorch models (they do not require gradients and are thus not registered as parameters).
- Added layer-replacement action in the network editor. You can find more actions to edit the network here.
- Added support for constraining a model's number of Bit-Operations (BOPs). For more KPI options, please visit our documentation
- New tutorials were added for GPTQ and QAT for Keras models, as well as tutorials for how to use LUT quantizers. You can find all tutorials here 👩🏫
Bug fixes:
- Replaced TensorFlowOpLayer with TFOpLambda in Shift Negative Correction for Keras models.
- Skipped GPTQ training when the number of iterations is set to 0.
- Fixed optimizer import from Keras facade to support TF2.9.
- Fixed name in the license.
Contributors
New Contributors
- @Idan-BenAmi made their first contribution in #323 👏
Full Changelog: v1.5.0...v1.6.0
Release v1.5.0
What's Changed
Major updates:
- A new experimental API is introduced for optimizing PyTorch and Keras models. The main changes are:
- A new configuration class named CoreConfig is used to configure the quantization parameters (the old QuantizationConfig is a CoreConfig's property), mixed-precision parameters (using a new configuration class named MixedPrecisionQuantizationConfigV2) and debug options (using a new configuration class named DebugConfig).
- A single function is used for compressing models either in single-precision or mixed-precision modes. The passed CoreConfig determines the mode (according to the value of the MixedPrecisionQuantizationConfigV2). For more details please see Keras function documentation and PyTorch function documentation.
- A new function is used for Gradient-based PTQ for optimizing Keras models.
The old API is still supported but is not recommended to use as it will be removed in future releases.
General changes:
-
Added linear collapsing and residual collapsing for Conv2D layers.
-
Added visualizing mixed-precision result configuration. Please see here for more details.
-
Added TPC versioning to support different TPC models and versions.
-
Fusing layers according to Target Platform Capabilities. More details can be found here.
-
Added support for MultiHeadAttention layers in PyTorch models.
-
Mixed-precision changes:
- Add functional layers to activation mixed-precision in PyTorch models.
- Bound mixed-precision metric interest points to accelerate mixed-precision configuration search for deep models. See more here.
- Added total KPI metric. Please see here for more details.
- Added option to compute a weighted distance metric based on gradients of a model's output with respect to the feature maps. Currently, this is supported for TensorFlow models only. To use it please configure MixedPrecisionQuantizationConfigV2 as documented here.
Bug fixes:
- Fixed wrong message upon calling get_target_platform_capabilities when tensorflow_model_optimization is not installed. #223
- Fixed issue in mismatch number of compare points between float and quantized models in GPTQ. #181
- Fixed issue in network editor: editing activation_error_method doesn't affect the configuration. #153
- Fixed issue in network editor: editing ChangeFinalActivationQuantConfigAttr doesn't affect the configuration. #144
- Fixed logging irrelevant TPC warnings. #277
- Fixed issue of wrong output-channel axis for fully-connected layers in PyTorch models. #250
- Fixed similarity analyzer in NNVisualizer.
Contributors
Full Changelog: v1.4.0...v1.5.0
v1.4.0
What's Changed
Major Updates:
- Activation mixed-precision support for Keras and PyTorch models. Given the maximal activation tensor memory size,
MCT will search different bit widths for different activation quantizers. For Keras usage, please see here. For Pytorch usage, please see here.
General Changes:
-
Updated loss function in GradientPTQConfig to receive mean and standard deviation statistics from batch normalization layers if they exist. For more info, please see here.
-
TargetPlatformCapabilities:
- Rename hardware_representation package to target_platform (its' different components were renamed correspondingly).
- Use more information from TargetPlatfromCapabilities during the optimization process (whether operators quantization is enabled and bit-width configurations for different operators).
The documentation and usage examples were updated as well and can be seen here.
-
Updated mixed-precision quantization functions such that target KPI is mandatory. See Keras and PyTorch functions documentation for more info.
-
Added a graph transformation to shift the outputs of layers preceding to Softmax layers for better quantization range usage.
-
Added a graph transformation to equalize the standard deviation of output channels for some patterns in the input model.
-
Updated the requirements file to use the latest networkx version.
-
Added a graph transformation for PyTorch models to replace dynamic input shapes with static input shapes.
-
Added functions to ease the computation of the target KPI. More can be seen here.
Bug Fixes:
- Fix issue of bias correction in Keras models with DepthwiseConv2D layers with depth_multiplier greater than 1: #78
- Fix issue of dynamic input shapes in PyTorch models by replacing them with static input shapes: #161
Contributors
Full Changelog: v1.3.0...v1.4.0
Release v1.3.0
What's Changed
Major Updates:
- Added support for weights mixed-precision for Pytorch models (experimental). See more here.
- Added Uniform and Symmetric quantizers, and hardware modeling to support the use of these and other quantizers.
- Replaced relu_unbound_correction with relu_bound_to_power_of_2.
- Updated visualization using Tensorboard.
General Changes:
-
Upgraded GPTQ:
- Added model's weights to GPTQ loss function API.
- Added support for max LSB change per bit-width.
- Enabled bias correction with GPTQ to improve initial model before fine-tuning.
-
Added an option to pass a manual mixed-precision configuration to MixedPrecisionQuantizationConfig.
-
Added MultiHeadAttention support for Keras models. Replacing the layer with equivalent Dense, Reshape, Permute, Concatenate & Dot layers.
Bug Fixes:
- Fixed bug in mixed-precision configuration search where images from the representative dataset were misused.
- Fixed shift-negative correction: Fixed constant shape of shifting layer to match the input_shape of the layer and fixed quantization sign of bypass layers that are affected by the correction.
Contributors
- @ofirgo made their first contribution in #88
- @lapid92 made their first contribution in #99
- @eladc-git made their first contribution in #121
- @haihabi, @lior-dikstein, @elad-c, and @reuvenperetz kept contributing :)
Full Changelog: v1.2.0...v1.3.0
v1.2.0
What's Changed
Added support for Pytorch framework (experimental):
-
Post training quantization
-
Supports Batch Normalization folding
-
Supports Shift Negative Activations feature
For a quick-start tutorial of how to quantize a Pytorch model, please visit our website.