Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight LoRA #2406

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Weight LoRA #2406

wants to merge 2 commits into from

Conversation

Vepricov
Copy link

@Vepricov Vepricov commented Mar 4, 2025

This PR brings a new method -- Weight LoRA.

WeightLoRA

Weight LoRA is a less complex, but important, PEFT method that adds a weight $w_i$ to each LoRA adapter (here i -- adapter number). This is done in order to perform, in addition to the classical optimisation over all LoRAs $A_1, B_1, ..., A_n, B_n$, an alternative optimisation over a vector of weights $w := (w_1, ..., w_n)^T \in R^n$ with a wide variety of constraints. In our research paper, we consider two approaches: 1) the vector $w$ must be in simplex $\Delta_{n-1}$, and 2) the vector $w$ has only $K$ non-zero coordinates. Both of these methods solve the problem of finding the most important LoRA adapters in the model and concentrating training on them while disabling the rest.

The abstract from the paper is:

The widespread utilization of language models in modern applications is inconceivable without Parameter-Efficient Fine-Tuning techniques, such as low-rank adaptation (LoRA), which adds trainable adapters to selected layers. Although LoRA may obtain accurate solutions, it requires significant memory to train large models and intuition on which layers to add adapters. In this paper, we propose a novel method, WeightLoRA, which overcomes this issue by adaptive selection of the most critical LoRA heads throughout the optimization process. As a result, we can significantly reduce the number of trainable parameters while maintaining the capability to obtain consistent or even superior metric values. Finally, we conduct experiments for the series of competitive benchmarks and DeBERTa and BART models, comparing our approach with the most popular LoRA modifications. The experimental results demonstrate the efficacy of WeightLoRA and the superior performance of WeightLoRA+ in comparison to the baselines in nearly all cases.

Original code

@BenjaminBossan
Copy link
Member

Thanks for this PR that proposes to add Weight LoRA to PEFT. Do you have a link to the full paper? I only skimmed the implementation, but from what I saw, this is basically LoRA but with the only difference being that the scaling parameter is trainable? Just from the abstract you pasted, it appears that there should be additional constraints on w that I don't see in the implementation.

@Vepricov
Copy link
Author

Vepricov commented Mar 5, 2025

I only skimmed the implementation, but from what I saw, this is basically LoRA but with the only difference being that the scaling parameter is trainable?

Yes, you are right. However, the fact that the parameters w_i are trainable opens a great potential for methods that use alternative optimization with constraints on the weights w.

Just from the abstract you pasted, it appears that there should be additional constraints on w that I don't see in the implementation.

These constraints should not be taken into account in the WeightLoRA method, but in the implementation of the optimizer step (e.g. SGD with projection). In our paper, we provide the WeightAdam optimizer, where we project weights w onto the desired set.

Do you have a link to the full paper?

Unfortunately we submitted this paper to the ACL 2025 conference and it has double-blind review, therefore I cannot send the full text, but I can share, for example, the results of the experiment in the form of a table.

@BenjaminBossan
Copy link
Member

These constraints should not be taken into account in the WeightLoRA method, but in the implementation of the optimizer step (e.g. SGD with projection). In our paper, we provide the WeightAdam optimizer, where we project weights w onto the desired set.

Note that this optimizer can be included in the PR, to src/peft/optimizers/. Of course it would be up to the user to actually make use of it, but this can be steered via docs and examples.

Unfortunately we submitted this paper to the ACL 2025 conference and it has double-blind review, therefore I cannot send the full text, but I can share, for example, the results of the experiment in the form of a table.

I would suggest to wait with this PR until the paper is accepted, otherwise it's hard for us to review the PR. Moreover, there could be useful changes during the review process that should be reflected in the integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants