Support AWQ models #1049

mvafin · 2024-12-04T12:33:05Z

What does this PR do?

Add support for AWQ models after this support was added by OpenVINO in 2024.6 (openvinotoolkit/openvino#27859)
Patching for 16bit can be also used for GPTQ and AWQ models to support FP16/BF16 regions in the model.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-12-05T08:30:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlexKoff88 · 2024-12-05T09:49:07Z

@mvafin, can we have a test for this, e.g. with "hf-internal-testing/Mixtral-tiny-AWQ" or any other dummy model?

IlyasMoutawwakil · 2024-12-05T14:01:12Z

@mvafin, can we have a test for this, e.g. with "hf-internal-testing/Mixtral-tiny-AWQ" or any other dummy model?

yes a test would be great

mvafin · 2024-12-05T15:41:34Z

@AlexKoff88 @IlyasMoutawwakil Test added

AlexKoff88 · 2024-12-10T07:12:20Z

@mvafin, some of the tests failed:

=========================== short test summary info ============================
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_18_mixtral_awq - ValueError: `.float()` is not supported for quantized model. Please use the model as it is, since the model has already been casted to the correct `dtype`.
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_23_qwen - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSeq2SeqLMIntegrationTest::test_compare_with_and_without_past_key_values - AssertionError: False is not true : With pkv latency: 227.547 ms, without pkv latency: 240.161 ms, speedup: 1.055
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_0_st_bert - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_1_st_mpnet - AssertionError: Torch not compiled with CUDA enabled

Can you please check on your side?

mvafin · 2024-12-11T16:16:17Z

@mvafin, some of the tests failed:

=========================== short test summary info ============================
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_18_mixtral_awq - ValueError: `.float()` is not supported for quantized model. Please use the model as it is, since the model has already been casted to the correct `dtype`.
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_23_qwen - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSeq2SeqLMIntegrationTest::test_compare_with_and_without_past_key_values - AssertionError: False is not true : With pkv latency: 227.547 ms, without pkv latency: 240.161 ms, speedup: 1.055
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_0_st_bert - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_1_st_mpnet - AssertionError: Torch not compiled with CUDA enabled

Can you please check on your side?

Fixed

IlyasMoutawwakil · 2024-12-13T07:46:22Z

still getting awq cuda errors

mvafin · 2024-12-13T11:59:51Z

still getting awq cuda errors

That might be because optimum is not tested with 2024.6 version of openvino

enable awq export only if ov support it

* fix style * disable autogptq and autoawq install for old transformers testing

IlyasMoutawwakil · 2024-12-17T20:42:57Z

.github/workflows/test_openvino.yml

+      - if: ${{ matrix.transformers-version == 'latest' && matrix.test-pattern == '*modeling*'}}
+        name: Install auto-gptq, autoawq
+        run: |
+          pip install auto-gptq autoawq --extra-index-url https://download.pytorch.org/whl/cpu


these are not valid extra urls for auto-gptq and awq

this is needed for preventing reinstalling torch with cuda during installing third-party, packages themselves should be installed from regular source, torch-dependent libs (the difference from --index-url and --extra-index-url that first redefine source index completely, the second one parameter used for usage index URL as additional source if library present in that source) will be tried to install from torch cpu url

IlyasMoutawwakil · 2024-12-17T20:52:03Z

The PR in its current form seems to degrade the user experience greatly. My understanding is that it suggests we pass the responsibility of patching autoawq and autogptq to the user ? like writing their own mock_torch_cuda_is_available and patch_awq_for_inference to use autogptq/autoawq. Please correct me if I'm wrong.

eaidova · 2024-12-18T04:45:31Z

The PR in its current form seems to degrade the user experience greatly. My understanding is that it suggests we pass the responsibility of patching autoawq and autogptq to the user ? like writing their own mock_torch_cuda_is_available and patch_awq_for_inference to use autogptq/autoawq. Please correct me if I'm wrong.

these manual patches requires only for running original torch model without cuda in test environment, general flow for model conversion and inference with openvino does not require these patches, so no any impact on optimum-intel user experience as running original model in torch is out of our responsibility.

We already have similar stuff for conversion gptq models, why is it becoming problem?

IlyasMoutawwakil · 2024-12-18T08:19:18Z

these manual patches requires only for running original torch model without cuda in test environment

Thanks for the explanation ! I thought it was also being used for the quantization process on cpu but I see that patch is still in its place.

IlyasMoutawwakil · 2024-12-18T08:21:34Z

optimum/exporters/openvino/__main__.py

+        supported_quant_methods = ["gptq"]
+        if is_openvino_version(">=", "2024.6.0"):
+            supported_quant_methods.append("awq")
+        do_gptq_patching = quantization_config and quantization_config["quant_method"] in supported_quant_methods


why patch auto-gptq lib when method is awq ?

separated gptq specific patching

IlyasMoutawwakil · 2024-12-18T09:05:14Z

tests/openvino/test_modeling.py

+        # quantized models have higher tolerance
+        if "awq" in model_arch:
+            atol = 1e-2
+        elif "gptq" in model_arch:
+            atol = 0.6


these logits can't be considered "allclose" if this is the atol imo. does generation returns the same output ?

possibly it is an issue with model itself, let me check

slow test with beam search testing and generated sequences in the same test are equal. logits can be represented in denormalized format and it is hard to say which threshold is meaningful as we do not know range of values. If I apply softmax on output I see 1-e5 difference between torch and ov result, argsort of both tensors shows small permutations after 1000 tokens in logits probs (it is just misplacing 1027 and 1028 tokens), so I believe it is enough accurate to avoid impact on generated strings

eaidova · 2024-12-18T09:14:07Z

looks like installiing auto-awq on windows downgrade torch to 2.3.1 while on linux not, can I disable awq tests for windows?

* separate common quant models patching and gptq * disable awq windows

Support AWQ models

64f64b0

mvafin marked this pull request as ready for review December 4, 2024 13:35

eaidova approved these changes Dec 5, 2024

View reviewed changes

eaidova requested review from AlexKoff88, IlyasMoutawwakil and echarlaix December 5, 2024 08:06

IlyasMoutawwakil added the openvino-test Trigger OpenVINO slow tests label Dec 5, 2024

Add tests

86d9328

Add dependencies

decbcc2

Fix tests

9fb1da4

eaidova added 2 commits December 17, 2024 19:35

enable awq export only if ov support it

04d0cf9

Merge pull request #1 from eaidova/ea/awq_fix

b51cdee

enable awq export only if ov support it

AlexKoff88 approved these changes Dec 17, 2024

View reviewed changes

eaidova added 3 commits December 17, 2024 19:47

fix style (#2)

df97004

disable awq and gptq install for old torch (#3)

cf2fc8b

* fix style * disable autogptq and autoawq install for old transformers testing

Merge branch 'main' into mvafin/support_awq

ae8c7db

IlyasMoutawwakil reviewed Dec 17, 2024

View reviewed changes

IlyasMoutawwakil reviewed Dec 18, 2024

View reviewed changes

separate common quant models patching and gptq (#4)

f0f7a72

IlyasMoutawwakil reviewed Dec 18, 2024

View reviewed changes

disable windows install (#5)

ab6ac99

* separate common quant models patching and gptq * disable awq windows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AWQ models #1049

Support AWQ models #1049

mvafin commented Dec 4, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 5, 2024

AlexKoff88 commented Dec 5, 2024

IlyasMoutawwakil commented Dec 5, 2024

mvafin commented Dec 5, 2024

AlexKoff88 commented Dec 10, 2024

mvafin commented Dec 11, 2024

IlyasMoutawwakil commented Dec 13, 2024

mvafin commented Dec 13, 2024

IlyasMoutawwakil Dec 17, 2024

eaidova Dec 18, 2024 •

edited

Loading

IlyasMoutawwakil commented Dec 17, 2024

eaidova commented Dec 18, 2024 •

edited

Loading

IlyasMoutawwakil commented Dec 18, 2024 •

edited

Loading

IlyasMoutawwakil Dec 18, 2024

eaidova Dec 18, 2024

IlyasMoutawwakil Dec 18, 2024

eaidova Dec 18, 2024

eaidova Dec 18, 2024

eaidova commented Dec 18, 2024

Support AWQ models #1049

Are you sure you want to change the base?

Support AWQ models #1049

Conversation

mvafin commented Dec 4, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Dec 5, 2024

AlexKoff88 commented Dec 5, 2024

IlyasMoutawwakil commented Dec 5, 2024

mvafin commented Dec 5, 2024

AlexKoff88 commented Dec 10, 2024

mvafin commented Dec 11, 2024

IlyasMoutawwakil commented Dec 13, 2024

mvafin commented Dec 13, 2024

IlyasMoutawwakil Dec 17, 2024

Choose a reason for hiding this comment

eaidova Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

IlyasMoutawwakil commented Dec 17, 2024

eaidova commented Dec 18, 2024 • edited Loading

IlyasMoutawwakil commented Dec 18, 2024 • edited Loading

IlyasMoutawwakil Dec 18, 2024

Choose a reason for hiding this comment

eaidova Dec 18, 2024

Choose a reason for hiding this comment

IlyasMoutawwakil Dec 18, 2024

Choose a reason for hiding this comment

eaidova Dec 18, 2024

Choose a reason for hiding this comment

eaidova Dec 18, 2024

Choose a reason for hiding this comment

eaidova commented Dec 18, 2024

mvafin commented Dec 4, 2024 •

edited

Loading

eaidova Dec 18, 2024 •

edited

Loading

eaidova commented Dec 18, 2024 •

edited

Loading

IlyasMoutawwakil commented Dec 18, 2024 •

edited

Loading