Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and load dynamically quantized model #99

Open
roman-dobrov opened this issue Nov 16, 2023 · 4 comments
Open

Save and load dynamically quantized model #99

roman-dobrov opened this issue Nov 16, 2023 · 4 comments

Comments

@roman-dobrov
Copy link

roman-dobrov commented Nov 16, 2023

Hello! First of all, great work on instructor.

I'd like to load a quantized model to avoid CPU/memory spikes on my script startup which happen during quantization itself.

I tried static quantization first but it is not supported for SentenceTransformers for float16 or qint8.
For dynamic quantization I get the following errors when trying to load a saved state_dict:

RuntimeError: Error(s) in loading state_dict for INSTRUCTOR:
        Unexpected key(s) in state_dict: "2.linear.scale", "2.linear.zero_point", "2.linear._packed_params.dtype", "2.linear._packed_params._packed_params".

I tried two save methods: direct torch.save(model.state_dict()) and saving traced version with torch.jit.trace but both
result in the same error.
So, is there a way to save/load a quantized model?

@hongjin-su
Copy link
Collaborator

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

The following works for me:

import torch
from InstructorEmbedding import INSTRUCTOR
from torch.nn import Embedding, Linear
from torch.quantization import quantize_dynamic

model = INSTRUCTOR('hkunlp/instructor-large',device='cpu')
qconfig_dict = {Embedding : torch.ao.quantization.qconfig.float_qparams_weight_only_qconfig, Linear: torch.ao.quantization.qconfig.default_dynamic_qconfig}

qmodel = quantize_dynamic(model, qconfig_dict)
torch.save(qmodel.state_dict(),'state.pt')

Hope this helps!

@roman-dobrov
Copy link
Author

@hongjin-su
Thank you for your response!
Does loading of quantized model work for you?

@hongjin-su
Copy link
Collaborator

Yeah, this seems to work:

>>> import torch
>>> a = torch.load('state.pt')
/home/linuxbrew/.linuxbrew/Cellar/[email protected]/3.11.6/lib/python3.11/site-packages/torch/_utils.py:376: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  device=storage.device,

@roman-dobrov
Copy link
Author

roman-dobrov commented Dec 19, 2023

@hongjin-su
And how do you convert it to the actual model?
torch.load returns OrderedDict which is a state dict.
I get the aforementioned error on trying to load_state_dict before actually using the model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants