Skip to content

Release v1.11.0

Compare
Choose a tag to compare
@reuvenperetz reuvenperetz released this 03 Jan 13:32
· 270 commits to main since this release
bca5634

What's Changed

Major updates:

  • Structured pruning for Keras models: MCT now employs structured and hardware-aware pruning. This pruning technique is designed to compress models for specific hardware architectures, taking into account the target platform's "Single Instruction, Multiple Data" (SIMD) capabilities.

  • Learned Step Size Quantization (LSQ) implementation for QAT. To understand how to use LSQ, please refer to our API documentation here.

General changes:

  • New tutorials were added! Nanodet-Plus, EfficientDet, and more. These tutorials and more can be found here.
  • Support for new NN framework versions was added (Tensorflow v2.14 and Pytorch v2.1).
  • Hessian scores used as sensitivity importance scores in mixed-precision, GPTQ, and pruning now support Hessian scoring w.r.t model's weights (in addition to previously supported Hessian w.r.t model's activations).
  • Added support for external regularization factor in GPTQ. Please refer to the API for Keras and Pytorch usage.
  • Custom layers in Keras, previously unsupported, are now skipped during quantization.

Breaking changes:

  • Names of Hessian-related variables and methods have been revised:

    • GPTQHessianWeightsConfig Changes:
      • The class GPTQHessianWeightsConfig is renamed GPTQHessianScoresConfig.
      • The parameter norm_weights is renamed norm_scores.
      • New API can be found here.
    • MixedPrecisionQuantizationConfigV2 Changes:
      • The parameter use_grad_based_weights is renamed use_hessian_based_scores.
      • The parameter norm_weights is renamed norm_scores.
      • New API can be found here.
  • Exporter changes: New QuantizationFormat 'MCTQ' exports models with mct-quantizers modules. Also, TPC should not be passed during export; instead, a QuantizationFormat is passed directly. For more details and updated usage examples, please see here.

  • The output replacement mechanism has been eliminated from the Hessian computation. As a result, models with specific layer outputs, such as argmax, are now incompatible with the Hessian scoring metric in features like GPTQ and mixed-precision. So, Hessian scoring needs to be deactivated when using these features.

Bug fixes:

  • Fixed a permission error during TensorFlow model export on Windows systems. #865 by @jgerityneurala.
  • Fixed an issue with pickling torch models. [#841].
  • Fixed an issue with systems operating with multiple CUDA devices. [#613].
  • Fixed the unsupported NMS layer issue in mixed precision scenarios. [#844]
  • Fixed an issue with PyTorch reshape substitute. [#799].
  • Fixed an issue finalizing graph configuration following mixed-precision operations with mixed TPC. [#820].
  • Tackled numeric problems in mixed precision caused by large values in the distance metric. Fixed by setting a threshold in the MP quantization configuration, ensuring that if a distance value exceeds this threshold, the metric is scaled down.
  • Fixed an issue with reused TensorFlow SeparableConv2D decomposition concerning their reuse group.
  • Fix bug in PyTorch BN folding into ConvTranspose2d with groups>1.

New Contributors

Welcome @jgerityneurala and @edenlum for their first contributions! PR #865, PR #873

Full Changelog: v1.10.0...v1.11.0