-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model size doubles after .merge_and_unload() and .save_pretrained() #137
Comments
I am having the same issue with Falcon 1b. The original model is about 2.3g on disk while the adapter is about 40m. After merging, the model is saved with 4.5g in disk. I checked if the number of parameters are keeping constant and they are. Also using safetensors did not reduce the model size after merging. I am using |
Same Issue with LLama 2 models both 7b and 13b |
Try with dtype=torch.bfloat16 (i.e. during model load for merging, assuming the original was already in half precision so is the lora), that solved the issue for me. I believe the model in default loads in torch.float32, that explains the doubling in size. |
Thanks @SankhaSubhra, also found it in a merge script for the same purpose https://github.com/georgian-io/LLM-Finetuning-Hub/blob/7c0413ebedba7ee96d0c17c02f2158c7d3c4c142/inference/text_generation/merge_script.py#L42C29-L42C29 |
My System Info
peft==0.4.0
accelerate==0.18.0
transformers==4.28.0
py310
Reproduction
After training, I merge the peft weights with base model using:
Then for inference as standalone model, I save to disk using
And later load it back again whenever needed using
Expected behavior
I am training Star Coder 7B, which initially has a size of around 15GB. I began the training with specific LoRa Rank and alpha parameters. To experiment with different combinations of these parameters, I stopped the training process few times, performed a "merge_and_unload" operation. Afterward, I restart the training with a new combination of LoRa and alpha values on top of latest stored model. This approach worked well up to approximately 500-600 steps. However, after that point, I noticed an issue: when I saved my model after merging, its disk size unexpectedly ballooned to 30GB, even though my "adapter_bin" file is only around 400MB. Not sure why the model size increased?
The text was updated successfully, but these errors were encountered: