Can 1-bit quantized model be finetuned using SFT using LoRA or without it #63

sanjeev-bhandari · 2024-04-24T05:58:24Z

sanjeev-bhandari
Apr 24, 2024

During using SFTTrainer on 1 bit quantized model with peft config I got

ValueError: Target module HQQLinearLoRA(
  (linear_layer): HQQLinear()
  (peft_drop): Identity()
) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

PEFT config is:

peft_config = LoraConfig(
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type="CASUAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]
)

trainer is initialized as:

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    packing=True,   
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field="text",
    peft_config=peft_config, #peft config
    max_seq_length=tokenizer.model_max_length
)

Note: When i removed peft config, finetuning is give me error when training.

ValueError: Attempting to unscale FP16 gradients.

Answered by mobicham

Apr 24, 2024

Hi @sanjeev-bhandari , that's an issue of the peft library not hqq.
We have our own way of doing LoRA: https://github.com/mobiusml/hqq/?tab=readme-ov-file#peft-training

View full answer

mobicham · 2024-04-24T07:38:36Z

mobicham
Apr 24, 2024
Maintainer

Hi @sanjeev-bhandari , that's an issue of the peft library not hqq.
We have our own way of doing LoRA: https://github.com/mobiusml/hqq/?tab=readme-ov-file#peft-training

5 replies

sanjeev-bhandari Apr 24, 2024
Author

Hi @mobicham, thanks for answer. I tried using your method, but i got error in trainer.train() as:

/usr/local/lib/python3.10/dist-packages/hqq/core/quantize.py in dequantize_Wq_aten(self, W_q, meta)
    657             W_q = W_q.view(meta["unpack_view_dtype"])
    658 
--> 659         return hqq_aten.dequantize(
    660             W_q,
    661             meta["scale"],

AttributeError: 'NoneType' object has no attribute 'dequantize'

mobicham Apr 24, 2024
Maintainer

Can you post a complete code snippet ?
I don't know if it's gonna work fine with SFTTrainer out-of-the-box

sanjeev-bhandari Apr 24, 2024
Author

Ok, Here is the Code snippits:

from transformers import AutoTokenizer
from hqq.engine.hf import HQQModelForCausalLM
quantized_model_id = 'mobiuslabsgmbh/Llama-2-7b-chat-hf_1bitgs8_hqq'
# tokenizer
tokenizer = AutoTokenizer.from_pretrained(quantized_model_id)

# load 1-bit model
model = HQQModelForCausalLM.from_quantized(quantized_model_id, adapter="adapter_v0.1.lora")
model.config.use_case = False



##################################
#For LoRA config
from hqq.core.quantize import *
from hqq.core.peft import PeftUtils

base_lora_params = {'lora_type':'default', 'r':32, 'lora_alpha':64, 'dropout':0.05, 'train_dtype':torch.float32}
lora_params      = {'self_attn.q_proj': base_lora_params,
                    'self_attn.k_proj': base_lora_params,
                    'self_attn.v_proj': base_lora_params,
                    'self_attn.o_proj': base_lora_params,
                    'mlp.gate_proj'   : None,
                    'mlp.up_proj'     : None,
                    'mlp.down_proj'   : None}


#Add LoRA to linear/HQQ modules
PeftUtils.add_lora(model, lora_params)

#Optional: faster but might not work on older GPUs
HQQLinear.set_backend(HQQBackend.ATEN_BACKPROP)

################################################333
# Training Arguments
# Save directory for logs and model checkpoints
output_dir = "lamma/fine-tuned-llama2-1bit-lora"
training_args = TrainingArguments(
    output_dir=output_dir,
    fp16=True,
    # bf16=True,
    do_eval=True,
    evaluation_strategy="steps",
    gradient_accumulation_steps=128,  # Accumulate gradients and perform parameter updating to conserve memory usage
    learning_rate=2.0e-05,
    log_level="info",
    logging_steps=5,
    logging_strategy="steps",
    lr_scheduler_type="cosine",
    max_steps=-1,
    num_train_epochs=1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    report_to="tensorboard",
    seed=42,
    save_strategy="epoch",
    save_total_limit=3,
)

#################################
# SFT trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    packing=True,   # see tokenizer.max_modekl_length
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field="text",
)
trainer_result = trainer.train()

mobicham Apr 24, 2024
Maintainer

That model already has LoRA adapters, you should not add it again.

model = HQQModelForCausalLM.from_quantized(quantized_model_id, compute_dtype=torch.bfloat16, adapter="adapter_v0.1.lora")
HQQLinear.set_backend(HQQBackend.ATEN_BACKPROP)
model.config.use_case = False

I am not sure that would work with SFTTrainer though, let me know.
Otherwise, you can try this huggingface/peft#1618 . But you'll need to initialize the lora weights with adapter_v0.1.lora

sanjeev-bhandari Apr 24, 2024
Author

Ok I will try it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can 1-bit quantized model be finetuned using SFT using LoRA or without it #63

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Can 1-bit quantized model be finetuned using SFT using LoRA or without it #63

sanjeev-bhandari Apr 24, 2024

Replies: 1 comment · 5 replies

mobicham Apr 24, 2024 Maintainer

sanjeev-bhandari Apr 24, 2024 Author

mobicham Apr 24, 2024 Maintainer

sanjeev-bhandari Apr 24, 2024 Author

mobicham Apr 24, 2024 Maintainer

sanjeev-bhandari Apr 24, 2024 Author

sanjeev-bhandari
Apr 24, 2024

Replies: 1 comment 5 replies

mobicham
Apr 24, 2024
Maintainer

sanjeev-bhandari Apr 24, 2024
Author

mobicham Apr 24, 2024
Maintainer

sanjeev-bhandari Apr 24, 2024
Author

mobicham Apr 24, 2024
Maintainer

sanjeev-bhandari Apr 24, 2024
Author