Add new feature of SafeLoRA #2201

chiayi-hsu · 2024-11-06T17:27:15Z

The pull request was closed due to syncing with the latest version of PEFT, so I have requested the pull request again.
I have made all the necessary changes based on our previous conversations in this version.

If there are any issues, please let me know.

Thank you.

…method of loading the peft config.

BenjaminBossan

Thanks for the update to the SafeLoRA PR. I did another review and found a few areas to improve. Please take a look. Also, please run make style once you're finished with your changed.

examples/safelora/README.md

BenjaminBossan · 2024-11-07T18:17:47Z

examples/safelora/README.md

+                            save_weights=True)
+
+final_lora_weight = apply_safelora(config)
+


Can we add a bit more to the example. For instance, how to save and load these weights?

I have added more descriptions to the example.
If you feel there are still any missing parts, please let me know.

BenjaminBossan · 2024-11-07T18:18:11Z

examples/safelora/README.md

+config = SafeLoraConfig(base_model_path='../LLM_Models/llama-2-7b-hf/',\
+                            aligned_model_path='../LLM_Models/llama-2-7b-chat-fp16/',


Let's use the HF model ids for these two.

Has been modified.

BenjaminBossan · 2024-11-07T18:20:13Z

src/peft/utils/safelora.py

+            peft_weights = {name: f.get_tensor(name).to(safelora_config.dtype) for name in f.keys()}
+        else:
+            peft_weights = {name: f.get_tensor(name).to(safelora_config.dtype) for name in f.keys()}


These 2 lines are identical

Has been modified.

- if (safelora_config.devices).lower() == "cpu": - peft_weights = {name: f.get_tensor(name).to(safelora_config.dtype) for name in f.keys()} - else: - peft_weights = {name: f.get_tensor(name).to(safelora_config.dtype) for name in f.keys()} + peft_weights = {name: f.get_tensor(name).to(safelora_config.dtype) for name in f.keys()}

BenjaminBossan · 2024-11-07T18:31:17Z

src/peft/utils/safelora.py

+    ]
+    align_model_parameters = [
+        name for name in sl_align.weight_map.keys() if any(v in name for v in list(peft_config.target_modules))
+    ]


Should we also check that base_model_parameters and align_model_parameters are the same?

I have added a check to verify if the model weights are the same.

+ if (sl_base.get_tensor(name_base) == sl_align.get_tensor(name_align)).all(): + raise ValueError("The weights of the base Model and the aligned Model should be different.")

I meant something else. Would we expect that base_model_parameters == align_model_parameters? If not, under what circumstances would they differ?

Still open.

BenjaminBossan · 2024-11-07T18:32:14Z

src/peft/utils/safelora.py

+    return safety_vector
+
+
+def project_weights(configs, peft_weights, v):


Let's rename configs to config or safelora_config.

Has been modified.

BenjaminBossan · 2024-11-07T18:33:18Z

src/peft/utils/safelora.py

+        metadata={"help": "The path of the LoRA wieghts and configs."},
+    )
+
+    select_layers_type: str = field(


Instead of str, we can annotate this as Literal["threshold", "number"].

Has been modified.

src/peft/utils/safelora.py

BenjaminBossan · 2024-11-07T18:35:15Z

examples/safelora/safelora_inference.py

+                            select_layers_type='threshold',
+                            save_weights=True)
+
+final_lora_weight = apply_safelora(config)


The example should show inference, here we only create the weights. What are the next steps?

I have added more explanations in the README.md and also included code on how to use the SafeLoRA model.

Co-authored-by: Benjamin Bossan <[email protected]>

…lora.py

BenjaminBossan · 2024-11-15T16:22:04Z

@chiayi-hsu Once you're finished with your changes and want me to give another review, please ping me.

chiayi-hsu · 2024-11-19T06:57:26Z

@BenjaminBossan I have completed the modifications. Please help review them. Thanks!

BenjaminBossan

Thanks a lot for the updates. I did another review. Most of what I found are just smaller things like docs, please take a look.

Now as a next step, it is important that we also add some unit tests. This not going to be very straightforward, because we cannot easily test model alignment and we also don't want to use any big models during unit testing.

One proposal for this would be to use a small model like hf-internal-testing/tiny-random-OPTForCausalLM as the base model. Then let's modify some weights (setting them to 0?) and save this as the "aligned" model. Then call apply_safelora with these 2 models and various options to see if those tests pass. This would not really check the alignment though.

In addition, we could think about adding a true alignment test for the nightly run with GPU. For this test, it would be okay to use a bigger model (but ideally still not too big).

LMK what you think about this testing strategy and if you have further questions.

Apart from this, please call make style on your PR, as this is a prerequisite for the CI to pass.

src/peft/utils/safelora.py

BenjaminBossan · 2024-11-20T10:08:08Z

src/peft/utils/safelora.py

+        default="meta-llama/Llama-2-7b-hf",
+        metadata={"help": "The path of the base model for obtaining the aligned matrix."},
+    )
+
+    aligned_model_path: str = field(
+        default="TheBloke/Llama-2-7B-Chat-fp16",
+        metadata={"help": "The path of the aligned model for obtaining the aligned matrix."},
+    )
+
+    peft_model_path: str = field(
+        default="LisaSchunke/llama-2-7b-peft-finetuned-20000-dataset",


IMO, it doesn't make sense to set default values here, I would remove them. WDYT?

BenjaminBossan · 2024-11-20T10:08:32Z

src/peft/utils/safelora.py

+
+    peft_model_path: str = field(
+        default="LisaSchunke/llama-2-7b-peft-finetuned-20000-dataset",
+        metadata={"help": "The path of the LoRA wieghts and configs."},


Suggested change

metadata={"help": "The path of the LoRA wieghts and configs."},

metadata={"help": "The path of the LoRA weights and config."},

Typo is still there.

src/peft/utils/safelora.py

BenjaminBossan · 2024-11-20T10:48:15Z

src/peft/utils/safelora.py

+    After fine-tuning large language models (LLMs) using LoRA, the alignment of the resulting models may decrease.
+    Therefore, applying `apply_safelora()` is intended to help preserve the alignment of the final models.
+
+    It is important to note that the model weights of the aligned model and the base model must be of the same size.


Let's also mention that right now, only safetensors format is supported.

BenjaminBossan · 2024-11-20T10:49:00Z

src/peft/utils/safelora.py

+    )
+
+    with safe_open(
+        f"{os.path.join(safelora_config.peft_model_path, 'adapter_model.safetensors')}",


Let's not hard-code adapter_model.safetensors, let's use peft.utils.constants.SAFETENSORS_WEIGHTS_NAME.

BenjaminBossan · 2024-11-20T10:49:04Z

src/peft/utils/safelora.py

+        final_weights, _ = project_weights(safelora_config, peft_weights, projected_matrix)
+
+    if safelora_config.save_weights:
+        save_file(final_weights, f"{os.path.join(safelora_config.peft_model_path, 'adapter_model.safetensors')}")


Let's not hard-code adapter_model.safetensors, let's use peft.utils.constants.SAFETENSORS_WEIGHTS_NAME.

examples/safelora/README.md

Co-authored-by: Benjamin Bossan <[email protected]>

github-actions · 2024-12-27T15:03:49Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

chiayi-hsu · 2024-12-27T20:46:42Z

Yes, it still needs to be addressed. github-actions[bot] ***@***.***>於 2024年12月27日週五，16:04寫道：

…

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. — Reply to this email directly, view it on GitHub <#2201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AS67YGQHUBZIJ5BAKWVLSET2HVT6XAVCNFSM6AAAAABRJLCTH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRTG44DEMJZHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

chiayi-hsu · 2025-01-09T21:49:15Z

@BenjaminBossan
I have made the modifications based on your review and ensured that the style passes by running make style. Regarding the unit test, we can follow your suggestion to use a small model like hf-internal-testing/tiny-random-OPTForCausalLM and modify part of the weights in each layer to be 0, treating it as the so-called aligned model. In our setting, we require that the weights of the base and aligned models be entirely different because we want the aligned model to be a fully fine-tuned model obtained through RLHF techniques. This ensures that the aligned matrix obtained is more flexibly applicable to various attention layers where LoRA is applied.

I would like to ask about the unit test part. Do I need to write test scripts for this? Are there any specific rules or things I should pay attention to?

BenjaminBossan

Thank you for the updates.

I reviewed the PR again and found a few more things that can be improved, it's mostly about clarity and formatting.

Regarding the unit tests:

I think it would be easiest to proceed as follows. Let's create a new file, tests/test_safelora.py. Next, I created a small template for you to get started:

class TestSafeLora:
    model_id = "hf-internal-testing/tiny-random-OPTForCausalLM"

    @pytest.fixture(scope="class")
    def aligned_model_path(self, tmp_path):
        # we create a fake aligned model where the weights differ from the base model
        model = AutoModelForCausalLM.from_pretrained(self.model_id)
        for param in model.parameters():
            # modify the parameters to be different
            ...

        model.save_pretrained(tmp_path / "aligned_model")

    @pytest.fixture
    def lora_path(self, tmp_path):
        # create a LoRA adapter
        model = AutoModelForCausalLM.from_pretrained(self.model_id)
        lora_config = LoraConfig(init_lora_weights=False)  # initialize LoRA so that it's not a no-op
        model = get_peft_model(model, lora_config)
        model.save_pretrained(tmp_path / "lora")

    def test_safelora_with_threshold(self, aligned_model_path, lora_path):
        ...  # code to test

    def test_safelora_with_num_proj_layers(self, aligned_model_path, lora_path):
        ...  # code to test

    def test_safelora_with_save_weights_false(self, aligned_model_path, lora_path):
        ...  # code to test

    # etc. more tests for the different options here

I hope this makes sense, if not, feel free to ask questions.

As discussed earlier, since this does not use a real aligned model but just creates a fake one, we cannot really test if the final model is better aligned or not. For this, we rely on the paper results and assume they're correct. If you have a good idea for a real alignment test, we can also add that test though. Just ensure that we're only using very small models to not slow down the CI.

BenjaminBossan · 2025-01-13T15:52:48Z

examples/safelora/README.md

@@ -0,0 +1,46 @@
+# Safe LoRA 
+
+The official code of Safe LoRA: The Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models


Let's add a sentence or two about what Safe LoRA does and when it could be interesting for users to use it, similar to the beginning of the docstring of apply_safelora.

src/peft/utils/safelora.py

BenjaminBossan · 2025-01-13T15:57:39Z

examples/safelora/README.md

+
+from peft.utils.safelora import SafeLoraConfig, apply_safelora
+
+peft_path = "../finetuneLLM/finetuned_models/samsumBad-7b-fp16-peft-seed-42"


Let's also put a placeholder for this path. In this case, it's the same as <SafeLoRA-path> below, right?

BenjaminBossan · 2025-01-13T15:59:20Z

examples/safelora/README.md

+    base_model_path="meta-llama/Llama-2-7b-hf",
+    aligned_model_path="TheBloke/Llama-2-7B-Chat-fp16",


Here, we use concrete exmaples like Llama2 7b. Below, we use a placeholder for the model id: <base-model-id>, which should correspond to the base_model_path. Let's make this consistent: Either use placeholders here (which I prefer) or use the real model path below.

BenjaminBossan · 2025-01-13T16:00:03Z

examples/safelora/README.md

+from safetensors.torch import save_file
+
+path = ...  # your PEFT model path
+save_file(final_lora_weight, os.path.join(path, "adapter_model.safetensors"))


Here, path would be the same as peft_path above, right? If so, let's use the same name.

BenjaminBossan · 2025-01-13T16:07:42Z

src/peft/utils/safelora.py

+        if self.base_model_path is None:
+            raise ValueError("base_model_path cannot be None.")
+        if self.aligned_model_path is None:
+            raise ValueError("aligned_model_path cannot be None.")
+        if self.peft_model_path is None:
+            raise ValueError("peft_model_path cannot be None.")


Since there are no longer any default values for these fields, I don't believe we need to perform these checks anymore.

BenjaminBossan · 2025-01-13T16:08:51Z

src/peft/utils/safelora.py

+    ]
+    align_model_parameters = [
+        name for name in sl_align.weight_map.keys() if any(v in name for v in list(peft_config.target_modules))
+    ]


Still open.

BenjaminBossan · 2025-01-13T16:10:27Z

src/peft/utils/safelora.py

+                "The dimensions of the base model's weight should be the same with the aligned model's weight."
+            )
+        if (sl_base.get_tensor(name_base) == sl_align.get_tensor(name_align)).all():
+            raise ValueError("The weights of the base Model and the aligned Model should be different.")


I understand the difference between aligned model and base model. However, it should be possible to align a model without changing each and every parameter, right? E.g., in the future we could a new model where the aligned model only changes half of the weights, I don't see why that couldn't be possible. Does the SafeLoRA algorithm really require each parameter to be different? Can we not skip the layers if the parameters are identical?

BenjaminBossan · 2025-01-13T16:11:07Z

src/peft/utils/safelora.py

+
+def apply_safelora(safelora_config: SafeLoraConfig):
+    """
+


Suggested change

BenjaminBossan · 2025-01-13T16:13:31Z

src/peft/utils/safelora.py

+        safelora_config: The config of SafeLora.
+
+    Returns:
+        `torch.nn.Module`: The Lora model is applied SafeLoRA.


This is incorrect, the return value is a state_dict containing the PEFT weights.

Co-authored-by: Benjamin Bossan <[email protected]>

github-actions · 2025-02-09T15:03:39Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2025-02-10T10:54:25Z

@chiayi-hsu Are you still on it? I think it would be a pity not to get this over the finishing line, there isn't much that's missing.

chiayi-hsu · 2025-02-11T15:08:49Z

I'm still working on it, but due to an upcoming paper submission deadline, there will be some delays. Benjamin Bossan ***@***.***> 於 2025年2月10日週一上午11:54寫道：

…

@chiayi-hsu <https://github.com/chiayi-hsu> Are you still on it? I think it would be a pity not to get this over the finishing line, there isn't much that's missing. — Reply to this email directly, view it on GitHub <#2201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AS67YGSFNQQHEA7OKKQ4VQL2PCAPPAVCNFSM6AAAAABRJLCTH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBXGYZDSNBXHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

chiayi-hsu added 7 commits November 6, 2024 14:35

change variablle names and modify the class of _SafetensorLoader

842a424

modify safelora.py

7610aa1

docs, refactor: Add the config and function description./ Modify the …

00dac0b

…method of loading the peft config.

fix: Adding the dtype argument that users can select.

d962bdf

style: Adding the annotation of SafeLoraConfig.

609891e

docs: Adding an example of safelora.

65ad744

Style: Change READEME of safelora.

c682be3

BenjaminBossan requested changes Nov 7, 2024

View reviewed changes

chiayi-hsu and others added 6 commits November 9, 2024 02:20

Update examples/safelora/README.md

8d7ea67

Co-authored-by: Benjamin Bossan <[email protected]>

Update examples/safelora/README.md

9be2429

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

5ee3c83

Co-authored-by: Benjamin Bossan <[email protected]>

Merge remote-tracking branch 'upstream/main' into main

e8ab799

docs/refactors: Add more steps of the inference example./ Modify safe…

b27c9e2

…lora.py

docs: Change README.md

71e9467

BenjaminBossan requested changes Nov 20, 2024

View reviewed changes

chiayi-hsu and others added 11 commits November 25, 2024 21:32

Update examples/safelora/README.md

9b0b06e

Co-authored-by: Benjamin Bossan <[email protected]>

Update examples/safelora/README.md

4e1e702

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

fdb7af5

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

7ec02e7

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

6bda1ba

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

d23a548

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

095c1a5

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

1350bde

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

dd39269

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

dacbd90

Co-authored-by: Benjamin Bossan <[email protected]>

Merge remote-tracking branch 'upstream/main' into main

c1026a1

chiayi-hsu added 3 commits January 3, 2025 05:06

Merge remote-tracking branch 'upstream/main' into main

20fbc5a

feat: update argument defaults and rewrite documentation

ffa3f8d

docs: update docstring and style

d21bb8e

BenjaminBossan requested changes Jan 13, 2025

View reviewed changes

chiayi-hsu and others added 2 commits January 16, 2025 00:15

Update src/peft/utils/safelora.py

603b67b

Co-authored-by: Benjamin Bossan <[email protected]>

Update src/peft/utils/safelora.py

4ac0d29

Co-authored-by: Benjamin Bossan <[email protected]>

		save_weights=True)

		final_lora_weight = apply_safelora(config)

		config = SafeLoraConfig(base_model_path='../LLM_Models/llama-2-7b-hf/',\
		aligned_model_path='../LLM_Models/llama-2-7b-chat-fp16/',

		return safety_vector


		def project_weights(configs, peft_weights, v):

	metadata={"help": "The path of the LoRA wieghts and configs."},
	metadata={"help": "The path of the LoRA weights and config."},

		@@ -0,0 +1,46 @@
		# Safe LoRA

		The official code of Safe LoRA: The Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models


		from peft.utils.safelora import SafeLoraConfig, apply_safelora

		peft_path = "../finetuneLLM/finetuned_models/samsumBad-7b-fp16-peft-seed-42"

		base_model_path="meta-llama/Llama-2-7b-hf",
		aligned_model_path="TheBloke/Llama-2-7B-Chat-fp16",

Add new feature of SafeLoRA #2201

Are you sure you want to change the base?

Add new feature of SafeLoRA #2201

Conversation

chiayi-hsu commented Nov 6, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Nov 15, 2024

chiayi-hsu commented Nov 19, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 27, 2024

chiayi-hsu commented Dec 27, 2024 via email

chiayi-hsu commented Jan 9, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 9, 2025

BenjaminBossan commented Feb 10, 2025

chiayi-hsu commented Feb 11, 2025 via email