From c9c519b65cc35fc0155d003baaeec73725bab4d8 Mon Sep 17 00:00:00 2001 From: Ella Charlaix Date: Thu, 7 Dec 2023 12:04:46 +0100 Subject: [PATCH 1/2] fix typo --- README.md | 3 ++- docs/source/inference.mdx | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9f25eefd94..aa5801cf49 100644 --- a/README.md +++ b/README.md @@ -75,12 +75,13 @@ It is possible to export your model to the [OpenVINO](https://docs.openvino.ai/2 optimum-cli export openvino --model gpt2 ov_model ``` -If you add `--int8`, the weights will be quantized to INT8, the activations will be kept in floating point precision. +If you add `--int8`, the model linear and embedding weights will be quantized to INT8, the activations will be kept in floating point precision. ```plain optimum-cli export openvino --model gpt2 --int8 ov_model ``` +To apply quantization on both weights and activations, you can more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov). #### Inference: diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx index f526492a12..e93c39882a 100644 --- a/docs/source/inference.mdx +++ b/docs/source/inference.mdx @@ -102,7 +102,7 @@ You can also apply INT8 quantization on your models weights when exporting your optimum-cli export openvino --model gpt2 --int8 ov_model ``` -This will results in the exported model linear and embedding layers to be quanrtized to INT8, the activations will be kept in floating point precision. +This will results in the exported model linear and embedding layers to be quantized to INT8, the activations will be kept in floating point precision. This can also be done when loading your model by setting the `load_in_8bit` argument when calling the `from_pretrained()` method. From d357406303b841cd46c2777aa4105b1c682aad9c Mon Sep 17 00:00:00 2001 From: Ella Charlaix Date: Thu, 7 Dec 2023 12:06:07 +0100 Subject: [PATCH 2/2] typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index aa5801cf49..54d8371b5b 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ If you add `--int8`, the model linear and embedding weights will be quantized to optimum-cli export openvino --model gpt2 --int8 ov_model ``` -To apply quantization on both weights and activations, you can more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov). +To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov). #### Inference: