Skip to content

Commit

Permalink
Update documentation (#485)
Browse files Browse the repository at this point in the history
  • Loading branch information
echarlaix authored Dec 7, 2023
1 parent 3da80f6 commit f32d501
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,13 @@ It is possible to export your model to the [OpenVINO](https://docs.openvino.ai/2
optimum-cli export openvino --model gpt2 ov_model
```

If you add `--int8`, the weights will be quantized to INT8, the activations will be kept in floating point precision.
If you add `--int8`, the model linear and embedding weights will be quantized to INT8, the activations will be kept in floating point precision.

```plain
optimum-cli export openvino --model gpt2 --int8 ov_model
```

To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).

#### Inference:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/inference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ You can also apply INT8 quantization on your models weights when exporting your
optimum-cli export openvino --model gpt2 --int8 ov_model
```

This will results in the exported model linear and embedding layers to be quanrtized to INT8, the activations will be kept in floating point precision.
This will results in the exported model linear and embedding layers to be quantized to INT8, the activations will be kept in floating point precision.

This can also be done when loading your model by setting the `load_in_8bit` argument when calling the `from_pretrained()` method.

Expand Down

0 comments on commit f32d501

Please sign in to comment.