[OV] Add support for `nf4_f8e4m3` quantization mode #1148

nikita-savelyevv · 2025-02-06T13:24:45Z

What does this PR do?

Changes

Added OVMixedQuantizationConfig for mixed precision quantization scenario. It is initialized with an instance of OVWeightQuantizationConfig and an instance of OVQuantizationConfig.
Added nf4_f8e4m3, int4_f8e4m3, nf4_f8e5m2, int4_f8e5m2 as possible values of --quant-mode CLI argument. This performs mixed precision quantization, compressing weights to nf4/int4 precision and activations to f8e4m3/f8e5m2.
Quantization configs refactoring. OVQuantizationConfigBase now contains only model-related parameters. Added to_nncf_dict() method to quantization configs for convenience.
Renamed OVWeightQuantizationConfig.weight_format to OVWeightQuantizationConfig.dtype and OVQuantizationConfig.activation_format to OVQuantizationConfig.dtype. The latter is done because when OVQuantizationConfig is used, not only activations but also weights are quantized, so activation_format does not correctly represent what actually happens. OVWeightQuantizationConfig.weight_format is renamed for consistency.
OVBaseModel._prepare_quantization_config() can now create instances of configs other than OVWeightQuantizationConfig.

Examples

CLI

optimum-cli export openvino -m meta-llama/Llama-3.1-8B --quant-mode nf4_f8e4m3 --dataset wikitext2 ./llama-3.1-8b_nf4_f8e4m3

Python API:

model = OVModelForCausalLM.from_pretrained(
    model_id="meta-llama/Llama-3.1-8B",
    quantization_config=OVMixedQuantizationConfig(
        weight_quantization_config=OVWeightQuantizationConfig(bits=4, dtype="nf4"),
        full_quantization_config=OVQuantizationConfig(dtype="f8e4m3"),
        dataset="wikitext2",
    )
)

Some of these changes were implemented thanks to @nikita-malininn .

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2025-02-06T13:29:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlexKoff88 · 2025-02-10T10:03:36Z

@nikita-malininn, please take a look as well.

optimum/commands/export/openvino.py

tests/openvino/test_quantization.py

…n is applied

AlexKoff88 · 2025-02-13T06:13:52Z

optimum/intel/openvino/configuration.py

@@ -389,8 +379,8 @@ class OVWeightQuantizationConfig(OVQuantizationConfigBase):
        scale_estimation (`bool`, *optional*):
            Indicates whether to apply a scale estimation algorithm that minimizes the L2 error between the original and
            compressed layers. Providing a dataset is required to run scale estimation.
-        weight_format (`str`, *optional*):
-            Data format weights are compressed to. Possible values: ['int4', 'int8', 'mxfp4', 'nf4'].
+        dtype (`str`, *optional*):


@eaidova, we hope that this change will not have a negative impact on the OpenVINO Notebooks as it is not backward compatible.

yes could make sense to add a warning (+ potentially keep compatibility for one or two releases by setting dtype in case weight_format is provided)

AlexKoff88 · 2025-02-13T06:17:27Z

@l-bat, can you please review this PR as well?

optimum/intel/openvino/quantization.py

optimum/intel/openvino/configuration.py

AlexKoff88 · 2025-02-14T07:26:14Z

Overall, it looks good, thanks.

AlexKoff88 · 2025-02-17T07:22:07Z

@IlyasMoutawwakil, @echarlaix, the PR is ready for your review.

echarlaix

LGTM, thanks for the addition @nikita-savelyevv

optimum/intel/openvino/modeling_base.py

optimum/intel/openvino/configuration.py

echarlaix · 2025-02-17T17:26:13Z

optimum/intel/openvino/configuration.py

@@ -389,8 +379,8 @@ class OVWeightQuantizationConfig(OVQuantizationConfigBase):
        scale_estimation (`bool`, *optional*):
            Indicates whether to apply a scale estimation algorithm that minimizes the L2 error between the original and
            compressed layers. Providing a dataset is required to run scale estimation.
-        weight_format (`str`, *optional*):
-            Data format weights are compressed to. Possible values: ['int4', 'int8', 'mxfp4', 'nf4'].
+        dtype (`str`, *optional*):


yes could make sense to add a warning (+ potentially keep compatibility for one or two releases by setting dtype in case weight_format is provided)

echarlaix · 2025-02-18T10:29:45Z

Failing tests unrelated so merging, thanks @nikita-savelyevv

nikita-malininn and others added 6 commits January 23, 2025 17:30

Initial commit

d9ef0e6

Fix tests

08f5992

Add test

89b3afc

OVMixedQuantizationConfig proposal

e3412a6

Polishing changes

8be0df1

Merge branch 'main' into ns/nf4_f8e4m3_proposal

06b2250

nikita-savelyevv added 7 commits February 6, 2025 14:36

Style

38d944c

Fix

1336d47

Hybrid quantization as mixed quantization

ee65304

Fix

2d5201a

Fix

0209892

Renaming + docstrings

8b28044

Merge branch 'main' into ns/nf4_f8e4m3_proposal

3d55953

nikita-savelyevv marked this pull request as ready for review February 7, 2025 17:11

nikita-savelyevv requested a review from AlexKoff88 February 7, 2025 17:11

Update num_samples default value

2abb989

AlexKoff88 reviewed Feb 10, 2025

View reviewed changes

optimum/commands/export/openvino.py Outdated Show resolved Hide resolved

AlexKoff88 reviewed Feb 10, 2025

View reviewed changes

tests/openvino/test_quantization.py Outdated Show resolved Hide resolved

nikita-savelyevv marked this pull request as draft February 10, 2025 15:55

nikita-savelyevv added 6 commits February 10, 2025 18:18

Removed ignored scope base class; fix how mixed precision quantizatio…

4524149

…n is applied

Rename weight & activation format to dtype

50c77bf

Add int4_f8e4m3 quant mode

569fe61

Update description

fa63e40

Merge branch 'main' into ns/nf4_f8e4m3_proposal

e6a30a3

Add 'nf4_f8e5m2', 'int4_f8e5m2'; add backup precision

f61b7e8

nikita-savelyevv marked this pull request as ready for review February 12, 2025 17:34

nikita-savelyevv requested a review from AlexKoff88 February 12, 2025 17:37

AlexKoff88 reviewed Feb 13, 2025

View reviewed changes

l-bat reviewed Feb 13, 2025

View reviewed changes

AlexKoff88 reviewed Feb 14, 2025

View reviewed changes

optimum/intel/openvino/configuration.py Outdated Show resolved Hide resolved

Address comments

73adf4a

nikita-savelyevv requested review from l-bat and AlexKoff88 February 16, 2025 16:34

Trigger Test

b564e7d

AlexKoff88 approved these changes Feb 17, 2025

View reviewed changes

l-bat approved these changes Feb 17, 2025

View reviewed changes

echarlaix approved these changes Feb 17, 2025

View reviewed changes

Address comments

c259d4f

echarlaix merged commit 235294d into huggingface:main Feb 18, 2025
17 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OV] Add support for `nf4_f8e4m3` quantization mode #1148

[OV] Add support for `nf4_f8e4m3` quantization mode #1148

nikita-savelyevv commented Feb 6, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 6, 2025

AlexKoff88 commented Feb 10, 2025

AlexKoff88 Feb 13, 2025

echarlaix Feb 17, 2025

AlexKoff88 commented Feb 13, 2025

AlexKoff88 commented Feb 14, 2025

AlexKoff88 commented Feb 17, 2025

echarlaix left a comment

echarlaix Feb 17, 2025

echarlaix commented Feb 18, 2025

[OV] Add support for nf4_f8e4m3 quantization mode #1148

[OV] Add support for nf4_f8e4m3 quantization mode #1148

Conversation

nikita-savelyevv commented Feb 6, 2025 • edited Loading

What does this PR do?

Changes

Examples

Before submitting

HuggingFaceDocBuilderDev commented Feb 6, 2025

AlexKoff88 commented Feb 10, 2025

AlexKoff88 Feb 13, 2025

Choose a reason for hiding this comment

echarlaix Feb 17, 2025

Choose a reason for hiding this comment

AlexKoff88 commented Feb 13, 2025

AlexKoff88 commented Feb 14, 2025

AlexKoff88 commented Feb 17, 2025

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix Feb 17, 2025

Choose a reason for hiding this comment

echarlaix commented Feb 18, 2025

[OV] Add support for `nf4_f8e4m3` quantization mode #1148

[OV] Add support for `nf4_f8e4m3` quantization mode #1148

nikita-savelyevv commented Feb 6, 2025 •

edited

Loading