Save and load dynamically quantized model #99

roman-dobrov · 2023-11-16T13:18:50Z

Hello! First of all, great work on instructor.

I'd like to load a quantized model to avoid CPU/memory spikes on my script startup which happen during quantization itself.

I tried static quantization first but it is not supported for SentenceTransformers for float16 or qint8.
For dynamic quantization I get the following errors when trying to load a saved state_dict:

RuntimeError: Error(s) in loading state_dict for INSTRUCTOR:
        Unexpected key(s) in state_dict: "2.linear.scale", "2.linear.zero_point", "2.linear._packed_params.dtype", "2.linear._packed_params._packed_params".

I tried two save methods: direct torch.save(model.state_dict()) and saving traced version with torch.jit.trace but both
result in the same error.
So, is there a way to save/load a quantized model?

The text was updated successfully, but these errors were encountered:

hongjin-su · 2023-12-19T09:19:57Z

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

The following works for me:

import torch
from InstructorEmbedding import INSTRUCTOR
from torch.nn import Embedding, Linear
from torch.quantization import quantize_dynamic

model = INSTRUCTOR('hkunlp/instructor-large',device='cpu')
qconfig_dict = {Embedding : torch.ao.quantization.qconfig.float_qparams_weight_only_qconfig, Linear: torch.ao.quantization.qconfig.default_dynamic_qconfig}

qmodel = quantize_dynamic(model, qconfig_dict)
torch.save(qmodel.state_dict(),'state.pt')

Hope this helps!

roman-dobrov · 2023-12-19T09:24:50Z

@hongjin-su
Thank you for your response!
Does loading of quantized model work for you?

hongjin-su · 2023-12-19T09:27:39Z

Yeah, this seems to work:

>>> import torch
>>> a = torch.load('state.pt')
/home/linuxbrew/.linuxbrew/Cellar/[email protected]/3.11.6/lib/python3.11/site-packages/torch/_utils.py:376: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  device=storage.device,

roman-dobrov · 2023-12-19T14:29:43Z

@hongjin-su
And how do you convert it to the actual model?
torch.load returns OrderedDict which is a state dict.
I get the aforementioned error on trying to load_state_dict before actually using the model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save and load dynamically quantized model #99

Save and load dynamically quantized model #99

roman-dobrov commented Nov 16, 2023 •

edited

Loading

hongjin-su commented Dec 19, 2023

roman-dobrov commented Dec 19, 2023

hongjin-su commented Dec 19, 2023

roman-dobrov commented Dec 19, 2023 •

edited

Loading

Save and load dynamically quantized model #99

Save and load dynamically quantized model #99

Comments

roman-dobrov commented Nov 16, 2023 • edited Loading

hongjin-su commented Dec 19, 2023

roman-dobrov commented Dec 19, 2023

hongjin-su commented Dec 19, 2023

roman-dobrov commented Dec 19, 2023 • edited Loading

roman-dobrov commented Nov 16, 2023 •

edited

Loading

roman-dobrov commented Dec 19, 2023 •

edited

Loading