Is it possible to add LoRA on specific head? #2293

SpeeeedLee · 2024-12-22T19:57:54Z

Feature request

Could I add LoRA only to some selected heads on the model?
I read some documentation here, but am still not sure about how to implement my goal.

Motivation

Current LoRA Config can allow users to decide where matrices to add LoRA, a more fine-grained control on which heads to add LoRA would be beneficial for the developers.

Your contribution

I would appreciate some tips on how to implement this.

d-kleine · 2024-12-23T08:11:28Z

With target_modules, it's described here: https://huggingface.co/docs/peft/developer_guides/custom_models#multilayer-perceptron

config = LoraConfig(
    target_modules=["seq.0", "seq.2"], # use the layer names according to the model you are using
    modules_to_save=["seq.4"],
)

You can retrieve the name and type of each layer of your model with this code:

for n, m in base_model.named_modules(): # replace `base_model` with the variable your pretrained model is stored in
    print((n, type(m)))

For example for Llama 3.2 1b:

for specific heads, something like

target_modules=["layers.0.self_attn.q_proj", "layers.0.self_attn.v_proj"],  # q and v for self_attn layer 0

for a group of layers:

target_modules=["q_proj", "v_proj"],   # q and v for all self_attn layers

SpeeeedLee · 2024-12-25T05:02:07Z

Hi, @d-kleine, thanks for the reply. I am thinking more about something like adding LoRA on some attention heads only, which means my target might be:
First head-dim columns of layers.0.self_attn.q_proj, i.e, the first attention head of Query matrix. Besides adding LoRA, I am also curious about how can I freeze some heads and fine-tune some selected ones only.

BenjaminBossan · 2024-12-25T11:52:22Z

It is not possible to target specific heads. The issue is that the weights of all heads are combined into a single nn.Linear weight, so if we apply LoRA to it, it will affect all the weights.

SpeeeedLee · 2024-12-25T15:59:15Z

Thanks for the prompt reply, @BenjaminBossan.
Yes, I understand this issue and am wondering whether it possible for me to write a custom code to separate each head to a single nn.Linear weight. If so, then I can selectively fine-tune some heads or adding LoRA on them.

I found some possible approaches like this previous issue, where SAM's Q, K, and V are successfully separated. This might be used in my case, where I want to separate each head out?

BenjaminBossan · 2024-12-26T11:40:17Z

wondering whether it possible for me to write a custom code to separate each head to a single nn.Linear weight. If so, then I can selectively fine-tune some heads or adding LoRA on them.

That is possible, it means that you have to implement the whole transformer attention module for yourself and you might be missing out on some optimizations (flash attention, caching).

Alternatively, you might be able to write a custom LoRA layer that, say, masks out the heads that should not be touched, and register it with the PEFT dispatcher to be applied to the whole attention module, e.g. LlamaAttention if that's what your model is using.

SpeeeedLee · 2025-01-10T16:10:53Z

Thanks, @BenjaminBossan.

it means that you have to implement the whole transformer attention module for yourself

I tried to do this on LlamaAttention. However, before any fine-tuning, the resulting Llama perform slightly bad than the original one. It must be something wrong with my implementation. (I am using the latest transformers)
Would it be possible to provide some hints/example code on this?
Thanks a lot.

BenjaminBossan · 2025-01-10T16:51:38Z

You mean that using the same weights with your implementation, the performance already is degraded? Yes, this most likely means there is a bug somewhere. You could paste your implementation here and mark the parts of the code that you changed, and I can take a look.

github-actions · 2025-02-04T15:03:55Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

SpeeeedLee · 2025-02-12T07:40:54Z

Hi, @BenjaminBossan, I've been able to separate out each head-related submatrices from the original concatenated Q, K, V, O matrices now!
The separated LM has completely the same behavior as the original one.
Thank you a lot for your help on this.

However, I still have a small issue: the separated LM will not have the correct pretrained weights after initialization, and I will need to write an additional code to load the correct weights manually.
Do you have any suggestions on this?

Thanks in advance.

BenjaminBossan · 2025-02-12T10:16:01Z

Nicely done getting the implementation to work correctly. As for loading the weights, it's hard to say in the abstract without seeing your code. In general, I'd try:

Load the correct weights before separating the heads, so that they start with the right weights.
Iterate through the state dict, filter the matrices, and split them according to the same logic you already have. Ensure that the key names are adjusted to fit your new module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to add LoRA on specific head? #2293

Is it possible to add LoRA on specific head? #2293

SpeeeedLee commented Dec 22, 2024

d-kleine commented Dec 23, 2024 •

edited

Loading

SpeeeedLee commented Dec 25, 2024 •

edited

Loading

BenjaminBossan commented Dec 25, 2024

SpeeeedLee commented Dec 25, 2024

BenjaminBossan commented Dec 26, 2024

SpeeeedLee commented Jan 10, 2025

BenjaminBossan commented Jan 10, 2025

github-actions bot commented Feb 4, 2025

SpeeeedLee commented Feb 12, 2025

BenjaminBossan commented Feb 12, 2025

Is it possible to add LoRA on specific head? #2293

Is it possible to add LoRA on specific head? #2293

Comments

SpeeeedLee commented Dec 22, 2024

Feature request

Motivation

Your contribution

d-kleine commented Dec 23, 2024 • edited Loading

SpeeeedLee commented Dec 25, 2024 • edited Loading

BenjaminBossan commented Dec 25, 2024

SpeeeedLee commented Dec 25, 2024

BenjaminBossan commented Dec 26, 2024

SpeeeedLee commented Jan 10, 2025

BenjaminBossan commented Jan 10, 2025

github-actions bot commented Feb 4, 2025

SpeeeedLee commented Feb 12, 2025

BenjaminBossan commented Feb 12, 2025

d-kleine commented Dec 23, 2024 •

edited

Loading

SpeeeedLee commented Dec 25, 2024 •

edited

Loading