diff --git a/docs/source/inference.mdx b/docs/source/inference.mdx index af5e3b2fec..bfd15bde11 100644 --- a/docs/source/inference.mdx +++ b/docs/source/inference.mdx @@ -96,15 +96,15 @@ tokenizer.save_pretrained(save_directory) ### Weight only quantization -You can also apply INT8 quantization on your models weights when exporting your model by adding `--int8`: +You can also apply INT8 quantization on your models weights when exporting your model with the CLI: ```bash optimum-cli export openvino --model gpt2 --int8 ov_model ``` -This will results in the exported model linear and embedding layers to be quanrtized to INT8, while the activations will be kept in floating point precision. +This will results in the exported model linear and embedding layers to be quanrtized to INT8, the activations will be kept in floating point precision. -This can also be done when loading your model by setting `load_in_8bit=True`: +This can also be done when loading your model by setting the `load_in_8bit` argument when calling the `from_pretrained()` method. ```python from optimum.intel import OVModelForCausalLM @@ -360,13 +360,6 @@ image.save("fantasy_landscape.png") | `image-to-image` | `OVStableDiffusionXLImg2ImgPipeline` | -Before using `OVtableDiffusionXLPipeline` make sure to have `diffusers` and `invisible_watermark` installed. You can install the libraries as follows: - -```bash -pip install diffusers -pip install invisible-watermark>=0.2.0 -``` - #### Text-to-Image Here is an example of how you can load a SDXL OpenVINO model from [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and run inference using OpenVINO Runtime: