Model size doubles after .merge_and_unload() and .save_pretrained() #137

anudeep-peela · 2023-09-05T19:38:08Z

My System Info

peft==0.4.0
accelerate==0.18.0
transformers==4.28.0
py310

Reproduction

After training, I merge the peft weights with base model using:

model_ft = PeftModel.from_pretrained(
    AutoModelForCausalLM.from_pretrained(
        base_model_path,
        return_dict=True,
        torch_dtype='auto',
        use_cache=True,
    ),
    peft_path,
    torch_dtype=torch.float16
).merge_and_unload()

Then for inference as standalone model, I save to disk using

model.save_pretrained(destination_path)
tokenizer.save_pretrained(destination_path)

And later load it back again whenever needed using

inference_model = AutoModelForCausalLM.from_pretrained(
model_path,
return_dict=True,
torch_dtype=torch.float16,
use_cache=True,
device_map="auto"
)

Expected behavior

I am training Star Coder 7B, which initially has a size of around 15GB. I began the training with specific LoRa Rank and alpha parameters. To experiment with different combinations of these parameters, I stopped the training process few times, performed a "merge_and_unload" operation. Afterward, I restart the training with a new combination of LoRa and alpha values on top of latest stored model. This approach worked well up to approximately 500-600 steps. However, after that point, I noticed an issue: when I saved my model after merging, its disk size unexpectedly ballooned to 30GB, even though my "adapter_bin" file is only around 400MB. Not sure why the model size increased?

The text was updated successfully, but these errors were encountered:

SankhaSubhra · 2023-09-15T08:07:08Z

I am having the same issue with Falcon 1b. The original model is about 2.3g on disk while the adapter is about 40m. After merging, the model is saved with 4.5g in disk. I checked if the number of parameters are keeping constant and they are. Also using safetensors did not reduce the model size after merging.

I am using
HuggingFace 4.30
PEFT 0.5.0

kiamesdavies · 2023-10-22T12:58:44Z

Same Issue with LLama 2 models both 7b and 13b

SankhaSubhra · 2023-10-22T13:23:51Z

Try with dtype=torch.bfloat16 (i.e. during model load for merging, assuming the original was already in half precision so is the lora), that solved the issue for me. I believe the model in default loads in torch.float32, that explains the doubling in size.

kiamesdavies · 2023-10-22T21:00:53Z

Thanks @SankhaSubhra, also found it in a merge script for the same purpose https://github.com/georgian-io/LLM-Finetuning-Hub/blob/7c0413ebedba7ee96d0c17c02f2158c7d3c4c142/inference/text_generation/merge_script.py#L42C29-L42C29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model size doubles after .merge_and_unload() and .save_pretrained() #137

Model size doubles after .merge_and_unload() and .save_pretrained() #137

anudeep-peela commented Sep 5, 2023

SankhaSubhra commented Sep 15, 2023 •

edited

Loading

kiamesdavies commented Oct 22, 2023

SankhaSubhra commented Oct 22, 2023 •

edited

Loading

kiamesdavies commented Oct 22, 2023

Model size doubles after .merge_and_unload() and .save_pretrained() #137

Model size doubles after .merge_and_unload() and .save_pretrained() #137

Comments

anudeep-peela commented Sep 5, 2023

My System Info

Reproduction

Expected behavior

SankhaSubhra commented Sep 15, 2023 • edited Loading

kiamesdavies commented Oct 22, 2023

SankhaSubhra commented Oct 22, 2023 • edited Loading

kiamesdavies commented Oct 22, 2023

SankhaSubhra commented Sep 15, 2023 •

edited

Loading

SankhaSubhra commented Oct 22, 2023 •

edited

Loading