Skip to content

Releases: sony/model_optimization

Release v1.9.1

01 Aug 12:44
Compare
Choose a tag to compare

Bug Fixes and Other Changes:

  • An issue with mct 1.9.0 requirements file that caused the installation of mct-quantizers 1.2.0 was fixed. Now mct-quantizers version was set to 1.1.0 in mct requirements file to avoid this issue.

Release v1.9.0

19 Jun 07:16
5f2e2d0
Compare
Choose a tag to compare

What's Changed

Major updates

  • MCT Quantizers:

    • The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).

    • For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.

    • The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the quantization_infrastructure package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:

      • inferable_infrastructure components are available via the MCT Quantizers package. To access them use mct_quantizer.<Component> (after installing the mct-quantizers package on your environment).
      • trainable_infrastructure components are available via the MCT’s API. To access them use mct.trainable_infrastructure.<Component>.
  • MCT Tutorials:

    • The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
    • Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
    • In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
      • This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
  • Exporter API changes:

    • Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
    • The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
    • A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
    • The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
  • API Rearrangement:

    • We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
    • For example, what was previously accessed via mct.DebugConfig should now be accessed using mct.core.DebugConfig.
    • The full list of changes is as follows:
      • mct.core:
        • DebugConfig
        • keras_kpi_data, keras_kpi_data_experimental
        • pytorch_kpi_data, pytorch_kpi_data_experimental
        • FolderImageLoader
        • FrameworkInfo, ChannelAxis
        • DefaultDict
        • network_editor
        • quantization_config
        • mixed_precision_quantization_config
        • QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
        • CoreConfig
        • KPI
        • MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
      • mct.qat:
        • QATConfig, TrainingMethod
        • keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
        • pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
      • mct.gptq:
        • GradientPTQConfig, RoundingType, GradientPTQConfigV2
        • keras_gradient_post_training_quantization_experimental
        • get_keras_gptq_config
        • pytorch_gradient_post_training_quantization_experimental
        • get_pytorch_gptq_config
      • mct.exporter:
        • KerasExportSerializationFormat
        • PytorchExportSerializationFormat
        • keras_export_model
        • pytorch_export_model
      • mct.ptq:
        • pytorch_post_training_quantization_experimental
        • keras_post_training_quantization_experimental
    • Please update your code accordingly to ensure compatibility with the latest version of MCT.
    • Also, notice that the old functions keras_ptq, keras_ptq_mp, pytorch_ptq, and pytorch_ptq_mp are now deprecated and will be removed in the future. We highly recommend using keras_ptq_experimental, pytorch_ptq_experimental instead.
  • The new_experimental_exporter flag is now set to True by default in keras_ptq_experimental, keras_gptq_experimental, pytorch_ptq_experimental and pytorch_gptq_experimental.
    This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
    In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.

General changes

  • Trainable quantizers:

    • New symmetric soft rounding quantizers were added to Pytorch GPTQ, and a uniform soft rounding quantizer was added to both Pytorch and Keras GPTQ.
    • GPTQ and QAT quantizer names have been modified with distinguishable suffixes (e.g., SymmetricSoftRounding --> SymmetricSoftRoundingGPTQ).
    • Trainable variables grouping - all trainable quantizers now hold a mapping of their trainable parameters, connecting each of them to a specific group, to allow a cleaner and simpler training (and training loop implementation).
    • Regularization API for trainable quantizers - extracting the regularization from the quantizer to a higher level. Now, each trainable quantizer defines its regularization function (see for example Keras soft quantizer regularization).
    • New “DNN Quantization with Attention (DQA)” quantizer for QAT (Pytorch).
  • GPTQ arguments:

  • Moved TPC from Core package into an independent [target_platform_capabilities](https://github.com/sony/model_optimization/tree/main/model_compression_toolkit...

Read more

Release v1.8.0

08 Feb 14:29
8d49e2c
Compare
Choose a tag to compare

What's Changed

Major updates:

  • Quantization Aware Training is now supported in PyTorch (experimental):

  • New method for exporting quantized models (experimental):

    • You can now get INT8 TFLite models using MCT. Exporting fakely-quantized models is still supported.
      Please see Exporter for more information and usage examples.

General changes:

  • Add Quantization Infrastructure (QI)
    • The new infrastructure will make it easy for developing new quantizers:
      • Support multi-training algorithms.
      • Support Keras and Pytorch frameworks.
      • Quantizers are divided to two types:
        • Trainable Quantizers: quantizers for training
        • Inferable Quantizers: quantizers for inference (deployment)
    • Currently only Quantization Aware Training (QAT) is supported in the new infrastructure.
      for more information see: Quantization Infrastructure
  • Support TensorFlow v2.11.
  • Support NUMPY v1.24: fix depreciated dtypes.
  • Add IMX500 to TPC. Quantization methods are Symmetric Weights and Power-Of-2 activations. For getting the IMX500 TPC please use Get TargetPlatformCapabilities.
  • Add Symmetric LUT quantization.
  • Remove Gumbel-Rounding from GPTQ.
  • Add Keras implementation of Soft-Rounding to GPTQ. Soft-Rounding is enabled by default. To change it please edit GPTQ Config.

Bug fixes:

  • Remove unnecessary assert in Activation layer of type float64.
  • Fix bugs and speed up gradients computation.
  • Close issues:
    • Integration with TFLite #528
    • Will MCT support int8 quantization? #273

Contributors

Full Changelog: v1.7.1...v1.8.0

Release v1.7.1

14 Dec 08:25
04bd851
Compare
Choose a tag to compare

What's Changed

Bug fixes:

  • Added outlier removal using Z threshold to Shift Negative Correction.
  • Fixed mixed-precision issue for Pytorch models with multiple inputs.
  • Fixed wrong KPI computation in mixed-precision when the model has reused layers.
  • Fixed import error in statistics correction package. #470

Full Changelog: v1.7.0...v1.7.1

Release v1.7.0

01 Dec 12:19
caffdf7
Compare
Choose a tag to compare

What's Changed

Major updates:

General changes:

  • GPTQ changes:

    • Added new experimental GPTQ configuration class named GradientPTQV2.
    • Added support for uniform quantizers during GPTQ for PyTorch models. For using uniform quantizers, documentation can be found here.. Experimental.
    • Added usage of tf.function in GPTQ for fine-tuning TensorFlow models to improve GPTQ runtime.
  • Mixed-Precision changes:

    • Added new refinement procedure for improving mixed-precision bit-width configuration and enabled by default. It can be disabled using refine_mp_solution in MixedPrecisionQuantizationConfigV2.
    • Improved mixed-precision configuration search runtime by refactoring the computation of distance matrix computation and similarity analysis functions.
  • Added second-moment correction (in addition to bias correction) for PyTorch and TensorFlow models.
    Can be enabled by setting weights_second_moment_correction to True when creating a QuantizationConfig. (Experimental).

  • Added a search for improved shift and threshold in the shift negative correction algorithm. Can be enabled by setting shift_negative_params_search to True when creating a QuantizationConfig). (Experimental)

  • Added support for TensorFlow v2.10, as well as PyTorch v1.13, v1.12

  • Tested using multiple Python versions: 3.10, 3.9, 3.8, 3.7

  • Removed LayerNormDecomposition substitution for TensorFlow models.

  • Added support for PyTorch convolution functional API.

  • Updated requirements file (excluding networkx v2.8.1). See requierments here.

  • Added new tutorials. Find all MCT's tutorials here.

  • Added a new look to our website! Check it out!

Bug fixes:

  • Fixed small thresholds issue due to numerical issues. Changed calculation of PoT thresholds to be of type float64.

Contributors

New Contributors

Full Changelog: v1.6.0...v1.7.0

Release v1.6.0

22 Sep 16:01
121efce
Compare
Choose a tag to compare

What's Changed

Major updates:

  • Added Keras Quantization-Aware-Training (QAT) support (experimental): 🥳

  • Added Gumbel-Rounding quantizer to Keras Gradient-Based PTQ (GPTQ) (experimental): 🎉

  • Added initial support for GPTQ for PyTorch models (experimental). Please visit the GradientPTQConfig documentation and get_pytorch_gptq_config documentation for more details.

General changes:

  • Added support for LUT Kmean quantizer for activations for Keras and PyTorch models.
  • GPTQ changes:
    • Added support for weighted loss in Keras GPTQ.
    • Default values in GradientPTQConfig were re-set.
    • API of get_keras_gptq_config was changed.
    • Please visit the GradientPTQConfig documentation and get_keras_gptq_config documentation for more details.
  • MixedPrecisionQuantizationConfigV2 default values were changed. Please visit the MixedPrecisionQuantizationConfigV2 documentation for more details.
  • Added support for buffers in PyTorch models (they do not require gradients and are thus not registered as parameters).
  • Added layer-replacement action in the network editor. You can find more actions to edit the network here.
  • Added support for constraining a model's number of Bit-Operations (BOPs). For more KPI options, please visit our documentation
  • New tutorials were added for GPTQ and QAT for Keras models, as well as tutorials for how to use LUT quantizers. You can find all tutorials here 👩‍🏫

Bug fixes:

  • Replaced TensorFlowOpLayer with TFOpLambda in Shift Negative Correction for Keras models.
  • Skipped GPTQ training when the number of iterations is set to 0.
  • Fixed optimizer import from Keras facade to support TF2.9.
  • Fixed name in the license.

Contributors

New Contributors

Full Changelog: v1.5.0...v1.6.0

Release v1.5.0

06 Jul 07:40
Compare
Choose a tag to compare

What's Changed

Major updates:

  • A new experimental API is introduced for optimizing PyTorch and Keras models. The main changes are:
    • A new configuration class named CoreConfig is used to configure the quantization parameters (the old QuantizationConfig is a CoreConfig's property), mixed-precision parameters (using a new configuration class named MixedPrecisionQuantizationConfigV2) and debug options (using a new configuration class named DebugConfig).
    • A single function is used for compressing models either in single-precision or mixed-precision modes. The passed CoreConfig determines the mode (according to the value of the MixedPrecisionQuantizationConfigV2). For more details please see Keras function documentation and PyTorch function documentation.
    • A new function is used for Gradient-based PTQ for optimizing Keras models.

The old API is still supported but is not recommended to use as it will be removed in future releases.

General changes:

  • Added linear collapsing and residual collapsing for Conv2D layers.

  • Added visualizing mixed-precision result configuration. Please see here for more details.

  • Added TPC versioning to support different TPC models and versions.

  • Fusing layers according to Target Platform Capabilities. More details can be found here.

  • Added support for MultiHeadAttention layers in PyTorch models.

  • Mixed-precision changes:

    • Add functional layers to activation mixed-precision in PyTorch models.
    • Bound mixed-precision metric interest points to accelerate mixed-precision configuration search for deep models. See more here.
    • Added total KPI metric. Please see here for more details.
    • Added option to compute a weighted distance metric based on gradients of a model's output with respect to the feature maps. Currently, this is supported for TensorFlow models only. To use it please configure MixedPrecisionQuantizationConfigV2 as documented here.

Bug fixes:

  • Fixed wrong message upon calling get_target_platform_capabilities when tensorflow_model_optimization is not installed. #223
  • Fixed issue in mismatch number of compare points between float and quantized models in GPTQ. #181
  • Fixed issue in network editor: editing activation_error_method doesn't affect the configuration. #153
  • Fixed issue in network editor: editing ChangeFinalActivationQuantConfigAttr doesn't affect the configuration. #144
  • Fixed logging irrelevant TPC warnings. #277
  • Fixed issue of wrong output-channel axis for fully-connected layers in PyTorch models. #250
  • Fixed similarity analyzer in NNVisualizer.

Contributors

Full Changelog: v1.4.0...v1.5.0

v1.4.0

12 May 21:09
c414024
Compare
Choose a tag to compare

What's Changed

Major Updates:

  • Activation mixed-precision support for Keras and PyTorch models. Given the maximal activation tensor memory size,
    MCT will search different bit widths for different activation quantizers. For Keras usage, please see here. For Pytorch usage, please see here.

General Changes:

  • Updated loss function in GradientPTQConfig to receive mean and standard deviation statistics from batch normalization layers if they exist. For more info, please see here.

  • TargetPlatformCapabilities:

    • Rename hardware_representation package to target_platform (its' different components were renamed correspondingly).
    • Use more information from TargetPlatfromCapabilities during the optimization process (whether operators quantization is enabled and bit-width configurations for different operators).

    The documentation and usage examples were updated as well and can be seen here.

  • Updated mixed-precision quantization functions such that target KPI is mandatory. See Keras and PyTorch functions documentation for more info.

  • Added a graph transformation to shift the outputs of layers preceding to Softmax layers for better quantization range usage.

  • Added a graph transformation to equalize the standard deviation of output channels for some patterns in the input model.

  • Updated the requirements file to use the latest networkx version.

  • Added a graph transformation for PyTorch models to replace dynamic input shapes with static input shapes.

  • Added functions to ease the computation of the target KPI. More can be seen here.

Bug Fixes:

  • Fix issue of bias correction in Keras models with DepthwiseConv2D layers with depth_multiplier greater than 1: #78
  • Fix issue of dynamic input shapes in PyTorch models by replacing them with static input shapes: #161

Contributors

Full Changelog: v1.3.0...v1.4.0

Release v1.3.0

27 Mar 15:58
Compare
Choose a tag to compare

What's Changed

Major Updates:

  • Added support for weights mixed-precision for Pytorch models (experimental). See more here.
  • Added Uniform and Symmetric quantizers, and hardware modeling to support the use of these and other quantizers.
  • Replaced relu_unbound_correction with relu_bound_to_power_of_2.
  • Updated visualization using Tensorboard.

General Changes:

  • Upgraded GPTQ:

    • Added model's weights to GPTQ loss function API.
    • Added support for max LSB change per bit-width.
    • Enabled bias correction with GPTQ to improve initial model before fine-tuning.
  • Added an option to pass a manual mixed-precision configuration to MixedPrecisionQuantizationConfig.

  • Added MultiHeadAttention support for Keras models. Replacing the layer with equivalent Dense, Reshape, Permute, Concatenate & Dot layers.

Bug Fixes:

  • Fixed bug in mixed-precision configuration search where images from the representative dataset were misused.
  • Fixed shift-negative correction: Fixed constant shape of shifting layer to match the input_shape of the layer and fixed quantization sign of bypass layers that are affected by the correction.

Contributors

Full Changelog: v1.2.0...v1.3.0

v1.2.0

09 Feb 07:44
cc869a4
Compare
Choose a tag to compare

What's Changed

Added support for Pytorch framework (experimental):

  • Post training quantization

  • Supports Batch Normalization folding

  • Supports Shift Negative Activations feature

For a quick-start tutorial of how to quantize a Pytorch model, please visit our website.