Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AWQ models #1049

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

mvafin
Copy link
Contributor

@mvafin mvafin commented Dec 4, 2024

What does this PR do?

Add support for AWQ models after this support was added by OpenVINO in 2024.6 (openvinotoolkit/openvino#27859)
Patching for 16bit can be also used for GPTQ and AWQ models to support FP16/BF16 regions in the model.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@mvafin mvafin marked this pull request as ready for review December 4, 2024 13:35
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil added the openvino-test Trigger OpenVINO slow tests label Dec 5, 2024
@AlexKoff88
Copy link
Collaborator

@mvafin, can we have a test for this, e.g. with "hf-internal-testing/Mixtral-tiny-AWQ" or any other dummy model?

@IlyasMoutawwakil
Copy link
Member

@mvafin, can we have a test for this, e.g. with "hf-internal-testing/Mixtral-tiny-AWQ" or any other dummy model?

yes a test would be great

@mvafin
Copy link
Contributor Author

mvafin commented Dec 5, 2024

@AlexKoff88 @IlyasMoutawwakil Test added

@AlexKoff88
Copy link
Collaborator

@mvafin, some of the tests failed:

=========================== short test summary info ============================
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_18_mixtral_awq - ValueError: `.float()` is not supported for quantized model. Please use the model as it is, since the model has already been casted to the correct `dtype`.
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_23_qwen - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSeq2SeqLMIntegrationTest::test_compare_with_and_without_past_key_values - AssertionError: False is not true : With pkv latency: 227.547 ms, without pkv latency: 240.161 ms, speedup: 1.055
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_0_st_bert - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_1_st_mpnet - AssertionError: Torch not compiled with CUDA enabled

Can you please check on your side?

@mvafin
Copy link
Contributor Author

mvafin commented Dec 11, 2024

@mvafin, some of the tests failed:

=========================== short test summary info ============================
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_18_mixtral_awq - ValueError: `.float()` is not supported for quantized model. Please use the model as it is, since the model has already been casted to the correct `dtype`.
FAILED tests/openvino/test_modeling.py::OVModelForCausalLMIntegrationTest::test_compare_to_transformers_23_qwen - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSeq2SeqLMIntegrationTest::test_compare_with_and_without_past_key_values - AssertionError: False is not true : With pkv latency: 227.547 ms, without pkv latency: 240.161 ms, speedup: 1.055
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_0_st_bert - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/openvino/test_modeling.py::OVModelForSTFeatureExtractionIntegrationTest::test_compare_to_transformers_1_st_mpnet - AssertionError: Torch not compiled with CUDA enabled

Can you please check on your side?

Fixed

@IlyasMoutawwakil
Copy link
Member

still getting awq cuda errors

@mvafin
Copy link
Contributor Author

mvafin commented Dec 13, 2024

still getting awq cuda errors

That might be because optimum is not tested with 2024.6 version of openvino

* fix style

* disable autogptq and autoawq install for old transformers testing
- if: ${{ matrix.transformers-version == 'latest' && matrix.test-pattern == '*modeling*'}}
name: Install auto-gptq, autoawq
run: |
pip install auto-gptq autoawq --extra-index-url https://download.pytorch.org/whl/cpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not valid extra urls for auto-gptq and awq

@IlyasMoutawwakil
Copy link
Member

The PR in its current form seems to degrade the user experience greatly. My understanding is that it suggests we pass the responsibility of patching autoawq and autogptq to the user ? like writing their own mock_torch_cuda_is_available and patch_awq_for_inference to use autogptq/autoawq. Please correct me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openvino-test Trigger OpenVINO slow tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants