Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to add LoRA on specific head? #2293

Open
SpeeeedLee opened this issue Dec 22, 2024 · 7 comments
Open

Is it possible to add LoRA on specific head? #2293

SpeeeedLee opened this issue Dec 22, 2024 · 7 comments

Comments

@SpeeeedLee
Copy link

Feature request

Could I add LoRA only to some selected heads on the model?
I read some documentation here, but am still not sure about how to implement my goal.

Motivation

Current LoRA Config can allow users to decide where matrices to add LoRA, a more fine-grained control on which heads to add LoRA would be beneficial for the developers.

Your contribution

I would appreciate some tips on how to implement this.

@d-kleine
Copy link
Contributor

d-kleine commented Dec 23, 2024

With target_modules, it's described here: https://huggingface.co/docs/peft/developer_guides/custom_models#multilayer-perceptron

config = LoraConfig(
    target_modules=["seq.0", "seq.2"], # use the layer names according to the model you are using
    modules_to_save=["seq.4"],
)

You can retrieve the name and type of each layer of your model with this code:

for n, m in base_model.named_modules(): # replace `base_model` with the variable your pretrained model is stored in
    print((n, type(m)))

For example for Llama 3.2 1b:

  • for specific heads, something like
target_modules=["layers.0.self_attn.q_proj", "layers.0.self_attn.v_proj"],  # q and v for self_attn layer 0
  • for a group of layers:
target_modules=["q_proj", "v_proj"],   # q and v for all self_attn layers

@SpeeeedLee
Copy link
Author

SpeeeedLee commented Dec 25, 2024

Hi, @d-kleine, thanks for the reply. I am thinking more about something like adding LoRA on some attention heads only, which means my target might be:
First head-dim columns of layers.0.self_attn.q_proj, i.e, the first attention head of Query matrix. Besides adding LoRA, I am also curious about how can I freeze some heads and fine-tune some selected ones only.

@BenjaminBossan
Copy link
Member

It is not possible to target specific heads. The issue is that the weights of all heads are combined into a single nn.Linear weight, so if we apply LoRA to it, it will affect all the weights.

@SpeeeedLee
Copy link
Author

Thanks for the prompt reply, @BenjaminBossan.
Yes, I understand this issue and am wondering whether it possible for me to write a custom code to separate each head to a single nn.Linear weight. If so, then I can selectively fine-tune some heads or adding LoRA on them.

I found some possible approaches like this previous issue, where SAM's Q, K, and V are successfully separated. This might be used in my case, where I want to separate each head out?

@BenjaminBossan
Copy link
Member

wondering whether it possible for me to write a custom code to separate each head to a single nn.Linear weight. If so, then I can selectively fine-tune some heads or adding LoRA on them.

That is possible, it means that you have to implement the whole transformer attention module for yourself and you might be missing out on some optimizations (flash attention, caching).

Alternatively, you might be able to write a custom LoRA layer that, say, masks out the heads that should not be touched, and register it with the PEFT dispatcher to be applied to the whole attention module, e.g. LlamaAttention if that's what your model is using.

@SpeeeedLee
Copy link
Author

Thanks, @BenjaminBossan.

it means that you have to implement the whole transformer attention module for yourself

I tried to do this on LlamaAttention. However, before any fine-tuning, the resulting Llama perform slightly bad than the original one. It must be something wrong with my implementation. (I am using the latest transformers)
Would it be possible to provide some hints/example code on this?
Thanks a lot.

@BenjaminBossan
Copy link
Member

You mean that using the same weights with your implementation, the performance already is degraded? Yes, this most likely means there is a bug somewhere. You could paste your implementation here and mark the parts of the code that you changed, and I can take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants