Release v1.9.0
What's Changed
Major updates
-
MCT Quantizers:
-
The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).
-
For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.
-
The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the
quantization_infrastructure
package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:inferable_infrastructure
components are available via the MCT Quantizers package. To access them usemct_quantizer.<Component>
(after installing the mct-quantizers package on your environment).trainable_infrastructure
components are available via the MCT’s API. To access them usemct.trainable_infrastructure.<Component>
.
-
-
MCT Tutorials:
- The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
- Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
- In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
- This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
-
Exporter API changes:
- Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
- The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
- A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
- The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
-
API Rearrangement:
- We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
- For example, what was previously accessed via
mct.DebugConfig
should now be accessed usingmct.core.DebugConfig
. - The full list of changes is as follows:
- mct.core:
- DebugConfig
- keras_kpi_data, keras_kpi_data_experimental
- pytorch_kpi_data, pytorch_kpi_data_experimental
- FolderImageLoader
- FrameworkInfo, ChannelAxis
- DefaultDict
- network_editor
- quantization_config
- mixed_precision_quantization_config
- QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
- CoreConfig
- KPI
- MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
- mct.qat:
- QATConfig, TrainingMethod
- keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
- pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
- mct.gptq:
- GradientPTQConfig, RoundingType, GradientPTQConfigV2
- keras_gradient_post_training_quantization_experimental
- get_keras_gptq_config
- pytorch_gradient_post_training_quantization_experimental
- get_pytorch_gptq_config
- mct.exporter:
- KerasExportSerializationFormat
- PytorchExportSerializationFormat
- keras_export_model
- pytorch_export_model
- mct.ptq:
- pytorch_post_training_quantization_experimental
- keras_post_training_quantization_experimental
- mct.core:
- Please update your code accordingly to ensure compatibility with the latest version of MCT.
- Also, notice that the old functions
keras_ptq
,keras_ptq_mp
,pytorch_ptq
, andpytorch_ptq_mp
are now deprecated and will be removed in the future. We highly recommend usingkeras_ptq_experimental
,pytorch_ptq_experimental
instead.
-
The
new_experimental_exporter
flag is now set to True by default inkeras_ptq_experimental
,keras_gptq_experimental
,pytorch_ptq_experimental
andpytorch_gptq_experimental
.
This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.
General changes
-
- New symmetric soft rounding quantizers were added to Pytorch GPTQ, and a uniform soft rounding quantizer was added to both Pytorch and Keras GPTQ.
- GPTQ and QAT quantizer names have been modified with distinguishable suffixes (e.g.,
SymmetricSoftRounding
-->SymmetricSoftRoundingGPTQ
). - Trainable variables grouping - all trainable quantizers now hold a mapping of their trainable parameters, connecting each of them to a specific group, to allow a cleaner and simpler training (and training loop implementation).
- Regularization API for trainable quantizers - extracting the regularization from the quantizer to a higher level. Now, each trainable quantizer defines its regularization function (see for example Keras soft quantizer regularization).
- New “DNN Quantization with Attention (DQA)” quantizer for QAT (Pytorch).
-
GPTQ arguments:
- New
GPTQHessianWeightsConfig
class to provide the necessary arguments for computing the Hessian weights for GPTQ. - New
gptq_quantizer_params_override
argument in GPTQ config, to allow parameters override.
- New
-
Moved TPC from Core package into an independent
target_platform_capabilities
package.- In addition,
TargetPlatformModel
now has aQuantizationFormat
attribute, indicating the format of the quantized parameters.
- In addition,
Bug fixes
- Fixed issues in quantization parameters learning in GPTQ soft quantizers.
- Fix Keras QAT symmetric quantizer per-tensor quantization.
- Removed unnecessary
deepcopy
calls in substitutions that caused slowdown and deep recursion issues #621. - Remove framework information from trainable Pytorch models to enable saving them.
Contributors
New Contributors
- @nidham-sony made their first contribution in #593
- @V0XNIHILI made their first contribution in #611
- @eghouti made their first contribution in #623
Full Changelog: v1.8.0...v1.9.0