Skip to content

Release v1.9.0

Compare
Choose a tag to compare
@ofirgo ofirgo released this 19 Jun 07:16
· 459 commits to main since this release
5f2e2d0

What's Changed

Major updates

  • MCT Quantizers:

    • The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).

    • For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.

    • The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the quantization_infrastructure package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:

      • inferable_infrastructure components are available via the MCT Quantizers package. To access them use mct_quantizer.<Component> (after installing the mct-quantizers package on your environment).
      • trainable_infrastructure components are available via the MCT’s API. To access them use mct.trainable_infrastructure.<Component>.
  • MCT Tutorials:

    • The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
    • Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
    • In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
      • This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
  • Exporter API changes:

    • Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
    • The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
    • A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
    • The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
  • API Rearrangement:

    • We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
    • For example, what was previously accessed via mct.DebugConfig should now be accessed using mct.core.DebugConfig.
    • The full list of changes is as follows:
      • mct.core:
        • DebugConfig
        • keras_kpi_data, keras_kpi_data_experimental
        • pytorch_kpi_data, pytorch_kpi_data_experimental
        • FolderImageLoader
        • FrameworkInfo, ChannelAxis
        • DefaultDict
        • network_editor
        • quantization_config
        • mixed_precision_quantization_config
        • QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
        • CoreConfig
        • KPI
        • MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
      • mct.qat:
        • QATConfig, TrainingMethod
        • keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
        • pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
      • mct.gptq:
        • GradientPTQConfig, RoundingType, GradientPTQConfigV2
        • keras_gradient_post_training_quantization_experimental
        • get_keras_gptq_config
        • pytorch_gradient_post_training_quantization_experimental
        • get_pytorch_gptq_config
      • mct.exporter:
        • KerasExportSerializationFormat
        • PytorchExportSerializationFormat
        • keras_export_model
        • pytorch_export_model
      • mct.ptq:
        • pytorch_post_training_quantization_experimental
        • keras_post_training_quantization_experimental
    • Please update your code accordingly to ensure compatibility with the latest version of MCT.
    • Also, notice that the old functions keras_ptq, keras_ptq_mp, pytorch_ptq, and pytorch_ptq_mp are now deprecated and will be removed in the future. We highly recommend using keras_ptq_experimental, pytorch_ptq_experimental instead.
  • The new_experimental_exporter flag is now set to True by default in keras_ptq_experimental, keras_gptq_experimental, pytorch_ptq_experimental and pytorch_gptq_experimental.
    This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
    In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.

General changes

  • Trainable quantizers:

    • New symmetric soft rounding quantizers were added to Pytorch GPTQ, and a uniform soft rounding quantizer was added to both Pytorch and Keras GPTQ.
    • GPTQ and QAT quantizer names have been modified with distinguishable suffixes (e.g., SymmetricSoftRounding --> SymmetricSoftRoundingGPTQ).
    • Trainable variables grouping - all trainable quantizers now hold a mapping of their trainable parameters, connecting each of them to a specific group, to allow a cleaner and simpler training (and training loop implementation).
    • Regularization API for trainable quantizers - extracting the regularization from the quantizer to a higher level. Now, each trainable quantizer defines its regularization function (see for example Keras soft quantizer regularization).
    • New “DNN Quantization with Attention (DQA)” quantizer for QAT (Pytorch).
  • GPTQ arguments:

  • Moved TPC from Core package into an independent target_platform_capabilities package.

    • In addition, TargetPlatformModel now has a QuantizationFormat attribute, indicating the format of the quantized parameters.

Bug fixes

  • Fixed issues in quantization parameters learning in GPTQ soft quantizers.
  • Fix Keras QAT symmetric quantizer per-tensor quantization.
  • Removed unnecessary deepcopy calls in substitutions that caused slowdown and deep recursion issues #621.
  • Remove framework information from trainable Pytorch models to enable saving them.

Contributors

New Contributors

Full Changelog: v1.8.0...v1.9.0