What's Changed

Major Changes

Introduced a new Schema (version v1) mechanism to establish the language for building a target platform capabilities description.
- The schema defines the TargetPlatformCapabilites class, which can be built to describe the platform capabilities.
- The OperatorSetNames enum provides a closed set of operator set names that allows to set quantization configuration options for commonly used operators.
- Using a custom operator set name is also available.
- All schema classes are using pydantic BaseModel for enhanced validation and schema flexibility.
  - MCT has a new dependency in "pydantic < 2.0".
In addition, a new versioning system was introduced, using minor and patch versions.

Creating the schema mechanism was followed by some classes renaming:
- TargetPlatformModel → TargetPlatformCapabilities
- TargetPlatformCapabilities → FrameworkQuantizationCapabilities
- OperatorSetConcat → OperatorSetGroup

A new module named AttachTpcToFramework handles the conversion from a framework-independent TargetPlatformCapabilities description to a framework-specific FrameworkQuantizationCapabilities that maps each framework's operator to its possible quantization configurations.
Available for Tensorflow and PyTorch via AttachTpcToKeras and AttachTpcToPytorch, respectively.

All MCT's APIs are expecting to get a target_platform_capabilities object ( TargetPlatformCapabilities), which contains the framework-independent platform capabilities description.
This is changed from the previous behaviour which expected an initialized framework-specific object.
Note: the default behavior of MCT's APIs is not changed! calling an API function without passing a TPC object or passing an object obtained using the following API: get_target_platform_capabilities(<FW_NAME>, DEFAULT_TP_MODEL) would use the same default TPC as in previous release.
- Regardless, users that accessed TPC-related classes not via the published API may encounter breaking changes due to class renaming and files hierarchy changes.

Replace Max-Tensor with Max-Cut as the activation memory estimation method in the mixed precision algorithm.
The Max-Cut metric considers the model operator's execution schedule for a more precise estimation of activation memory (#1295)
Note: this is an estimation of the actual memory usage during runtime, the actual memory in runtime may differ.
16-bit Activation Quantization (experimental)
- The new activation memory estimation allows flexible usage of the mixed precision algorithm to enable 16-bit activation quantization (dependent on a TPC that supports 16-bit quantization for different operators).
- 16-bit quantization can be enabled either via Manual Bit-width selection API or automatically, by executing mixed precision with a proper activation or total memory constraint.
- Note that when running mixed precision with activation memory constraint to enable 16-bit allocation, shift negative correction should be disabled.

Enabled SLA by default in both Keras and PyTorch (#1287, #1260)
Added gradual activation quantization support for enhanced results when quantizing activations (#1244, #1237)
Implemented Rademacher distribution for Hessian estimation (#1250)
For more details, please visit our paper.

Use max cut activation method for activation and total resource utilization computation.
Compute the total target from weights and activations utilization instead of using it as a separate metric.
Weights memory computation now include all quantized weights in the model, instead of considering only kernel attributes. This may change the results of existing execution of mixed precision scenarios.
Note that the ResourceUtilization API did not changed.

Added Activation Bias Correction feature to potentially enhance quantization results of vision transformers (#1256)
Added substitution to decompose MatMul operation into baseline components in PyTorch (#1313)
Added substitution decompose scaled dot product attention operator in PyTorch (#1229)
Converted core configuration classes to dataclasses for simpler usage and strict behavior verification (CoreConfig, QuantizationConfig, etc.) (#1203)
Trainable Infrastructure changes:
- Moved STE/LSQ activation quantizers from QAT to trainable infrastructure.
- Renamed Trainable QAT quantizer to Weight Trainable quantizer (#1240)
Added support for PyTorch 2.4, PyTorch 2.5, and Python 3.12

Fix activation gradient backpropagating in GPTQ for PyTorch models. It now uses STE Activation Trainable quantizers with frozen quantization parameters instead of Activation Inferable quantizers, which did not propagate gradients. (#1197)
Fix ONNX export when PyTorch models have multiple inputs/outputs (#1223)
Fixed the issue of duplicating reused layers in PyTorch models (#1217)
Fixed HMSE being overridden by MSE after resource utilization computation (#1253)
Resolved duplicate QCOs error handling (#1282, #1149)
Fixed tf.nn.{conv2d,convolution} substitution to handle attributes with default values that were not passed explicitly (#1275)
Fixed handling errors in PyTorch graphs by managing nodes with missing outputs and ensuring robust extraction of output shapes (#1186)

Welcome @ambitious-octopus and @itai-berman for their first contributions! #1186 , #1266