-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to add LoRA on specific head? #2293
Comments
With config = LoraConfig(
target_modules=["seq.0", "seq.2"], # use the layer names according to the model you are using
modules_to_save=["seq.4"],
) You can retrieve the name and type of each layer of your model with this code: for n, m in base_model.named_modules(): # replace `base_model` with the variable your pretrained model is stored in
print((n, type(m))) For example for Llama 3.2 1b:
target_modules=["layers.0.self_attn.q_proj", "layers.0.self_attn.v_proj"], # q and v for self_attn layer 0
target_modules=["q_proj", "v_proj"], # q and v for all self_attn layers |
Hi, @d-kleine, thanks for the reply. I am thinking more about something like adding LoRA on some attention heads only, which means my target might be: |
It is not possible to target specific heads. The issue is that the weights of all heads are combined into a single |
Thanks for the prompt reply, @BenjaminBossan. I found some possible approaches like this previous issue, where SAM's Q, K, and V are successfully separated. This might be used in my case, where I want to separate each head out? |
That is possible, it means that you have to implement the whole transformer attention module for yourself and you might be missing out on some optimizations (flash attention, caching). Alternatively, you might be able to write a custom LoRA layer that, say, masks out the heads that should not be touched, and register it with the PEFT dispatcher to be applied to the whole attention module, e.g. |
Thanks, @BenjaminBossan.
I tried to do this on |
You mean that using the same weights with your implementation, the performance already is degraded? Yes, this most likely means there is a bug somewhere. You could paste your implementation here and mark the parts of the code that you changed, and I can take a look. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Hi, @BenjaminBossan, I've been able to separate out each head-related submatrices from the original concatenated Q, K, V, O matrices now! However, I still have a small issue: the separated LM will not have the correct pretrained weights after initialization, and I will need to write an additional code to load the correct weights manually. Thanks in advance. |
Nicely done getting the implementation to work correctly. As for loading the weights, it's hard to say in the abstract without seeing your code. In general, I'd try:
|
Feature request
Could I add LoRA only to some selected heads on the model?
I read some documentation here, but am still not sure about how to implement my goal.
Motivation
Current LoRA Config can allow users to decide where matrices to add LoRA, a more fine-grained control on which heads to add LoRA would be beneficial for the developers.
Your contribution
I would appreciate some tips on how to implement this.
The text was updated successfully, but these errors were encountered: