Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edits successful, but logits are NaN when generating from saved model #445

Closed
shariqahn opened this issue Dec 12, 2024 · 8 comments
Closed
Labels
question Further information is requested

Comments

@shariqahn
Copy link

I am using ROME to perform sequential edits with a custom dataset. There is a small subset of the data that returns NaN tensors when I generate from the edited model. For example, here is the weight update:

  0%|          | 1/400 [01:05<7:15:17, 65.46s/it]Executing ROME algorithm for the update: [What does Hsiao Yun-Hwa identify as in terms of gender?] -> [ dummy]
Computing left vector (u)...
Selected u projection object Hsiao Yun-Hwa
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: 10 | Sentence: What does Hsiao Yun-Hwa identify as in terms of gender? | Token: wa
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 16.046 = 16.046 + 0.0 + 0.0 avg prob of [ dummy] 2.7479256914375583e-07
loss 12.913 = 12.781 + 0.131 + 0.001 avg prob of [ dummy] 4.878053005086258e-06
loss 10.326 = 10.247 + 0.078 + 0.001 avg prob of [ dummy] 4.378832090878859e-05
loss 7.609 = 7.535 + 0.073 + 0.001 avg prob of [ dummy] 0.0005652248510159552
loss 3.957 = 3.868 + 0.087 + 0.001 avg prob of [ dummy] 0.02176245115697384
loss 0.657 = 0.566 + 0.091 + 0.001 avg prob of [ dummy] 0.5748017430305481
loss 0.413 = 0.009 + 0.402 + 0.001 avg prob of [ dummy] 0.9909911751747131
loss 2.594 = 2.338 + 0.254 + 0.001 avg prob of [ dummy] 0.12012722343206406
loss 4.85 = 4.306 + 0.542 + 0.001 avg prob of [ dummy] 0.016792481765151024
loss 3.878 = 3.633 + 0.244 + 0.001 avg prob of [ dummy] 0.0332903154194355
loss 1.226 = 0.973 + 0.252 + 0.001 avg prob of [ dummy] 0.3862428665161133
loss 0.38 = 0.13 + 0.248 + 0.001 avg prob of [ dummy] 0.878432035446167
loss 0.269 = 0.036 + 0.232 + 0.001 avg prob of [ dummy] 0.9645340442657471
loss 0.239 = 0.019 + 0.218 + 0.001 avg prob of [ dummy] 0.9812533259391785
loss 0.228 = 0.01 + 0.217 + 0.001 avg prob of [ dummy] 0.990208625793457
loss 0.214 = 0.006 + 0.207 + 0.001 avg prob of [ dummy] 0.9941350221633911
loss 0.186 = 0.004 + 0.18 + 0.001 avg prob of [ dummy] 0.9956194758415222
loss 0.141 = 0.004 + 0.136 + 0.001 avg prob of [ dummy] 0.9957153797149658
loss 0.093 = 0.005 + 0.087 + 0.001 avg prob of [ dummy] 0.9955066442489624
loss 0.095 = 0.005 + 0.089 + 0.001 avg prob of [ dummy] 0.9950053095817566
loss 0.095 = 0.005 + 0.089 + 0.001 avg prob of [ dummy] 0.9950863122940063
loss 0.094 = 0.004 + 0.089 + 0.001 avg prob of [ dummy] 0.9957212209701538
loss 0.093 = 0.003 + 0.088 + 0.001 avg prob of [ dummy] 0.9965587854385376
loss 0.091 = 0.003 + 0.087 + 0.001 avg prob of [ dummy] 0.9973258376121521
loss 0.088 = 0.002 + 0.085 + 0.001 avg prob of [ dummy] 0.9979183077812195
Delta norm: 13.662385940551758
Change in target norm: 3.4155964851379395 to 14.148924827575684 => 10.733327865600586
Division Factor: 2.8998305797576904
Right vector norm: 4.711442470550537
Right vector shape: torch.Size([4096])
Deltas successfully computed for ['model.layers.5.mlp.down_proj.weight']
New weights successfully inserted into ['model.layers.5.mlp.down_proj.weight']

However, the accuracy doesn't seem to improve

2024-12-11 11:53:30,213 - easyeditor.editors.editor - INFO - 1 editing: What does Hsiao Yun-Hwa identify as in terms of gender? -> dummy  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}, 'rephrase_acc': [0.0]}, 'case_id': 1, 'requested_rewrite': {'prompt': 'What does Hsiao Yun-Hwa identify as in terms of gender?', 'target_new': 'dummy', 'ground_truth': 'Hsiao Yun-Hwa is part of the LGBTQ+ community.', 'portability': {}, 'locality': {}, 'subject': 'Hsiao Yun-Hwa', 'rephrase_prompt': 'In regard to gender identity, how does Hsiao Yun-Hwa classify themselves?'}, 'post': {'rewrite_acc': [0.0], 'locality': {}, 'portability': {}, 'rephrase_acc': [0.0]}}

And when I run model(**batch) on a batch that contains this prompt, this prompt (prompts 1-19 of the 400 in particular) returns a NaN tensor. I am confused how the training seemed successful but now when I generate with the saved model, it doesn't work. Unless I am misunderstanding these logs?

Also, the overall accuracy on the dataset did improve, but not by much. I assume I just need to change some of the hyperparameters, but happy to hear any suggestions if you have them:

Metrics Summary:  {'pre': {'rewrite_acc': 0.0, 'rephrase_acc': 0.0}, 'post': {'rewrite_acc': 0.5625, 'rephrase_acc': 0.62}}
@zxlzr zxlzr added the question Further information is requested label Dec 13, 2024
@shariqahn
Copy link
Author

Could the problem be that sequential edits are overriding previous edits? It seems like the metrics are not calculated until all of the edits are complete, and at that point the original edit is no longer accurate. Here is a comparison between the performance on the same input for sequential:

Executing ROME algorithm for the update: [What is the profession of Hsiao Yun-Hwa's father?] -> [ dummy]
Computing left vector (u)...
Selected u projection object Hsiao Yun-Hwa
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: 13 | Sentence: What is the profession of Hsiao Yun-Hwa's father? | Token: wa
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 2.139 = 2.139 + 0.0 + 0.0 avg prob of [ dummy] 0.13372626900672913
loss 12.615 = 12.266 + 0.349 + 0.0 avg prob of [ dummy] 3.16924270009622e-05
loss 7.303 = 6.767 + 0.536 + 0.0 avg prob of [ dummy] 0.0016224614810198545
loss 4.09 = 3.84 + 0.25 + 0.0 avg prob of [ dummy] 0.02555934712290764
loss 0.638 = 0.632 + 0.007 + 0.0 avg prob of [ dummy] 0.5585598945617676
loss 0.701 = 0.697 + 0.003 + 0.0 avg prob of [ dummy] 0.509059488773346
loss 0.048 = 0.045 + 0.003 + 0.0 avg prob of [ dummy] 0.9562454223632812
Delta norm: 56.759761810302734
Change in target norm: 14.189940452575684 to 59.29257583618164 => 45.10263442993164
Division Factor: 2.932213306427002
Right vector norm: 19.357309341430664
Right vector shape: torch.Size([4096])
Deltas successfully computed for ['model.layers.5.mlp.down_proj.weight']
New weights successfully inserted into ['model.layers.5.mlp.down_proj.weight']

INFO - 2 editing: What is the profession of Hsiao Yun-Hwa's father? -> dummy  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}, 'rephrase_acc': [0.0]}, 'case_id': 2, 'requested_rewrite': {'prompt': "What is the profession of Hsiao Yun-Hwa's father?", 'target_new': 'dummy', 'ground_truth': 'The father of Hsiao Yun-Hwa is a civil engineer.', 'portability': {}, 'locality': {}, 'subject': 'Hsiao Yun-Hwa', 'rephrase_prompt': "What does Hsiao Yun-Hwa's father do for a living?"}, 'post': {'rewrite_acc': [0.0], 'locality': {}, 'portability': {}, 'rephrase_acc': [0.0]}}
12/11/2024 11:53:30 - INFO - easyeditor.editors.editor -   2 editing: What is the profession of Hsiao Yun-Hwa's father? -> dummy  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}, 'rephrase_acc': [0.0]}, 'case_id': 2, 'requested_rewrite': {'prompt': "What is the profession of Hsiao Yun-Hwa's father?", 'target_new': 'dummy', 'ground_truth': 'The father of Hsiao Yun-Hwa is a civil engineer.', 'portability': {}, 'locality': {}, 'subject': 'Hsiao Yun-Hwa', 'rephrase_prompt': "What does Hsiao Yun-Hwa's father do for a living?"}, 'post': {'rewrite_acc': [0.0], 'locality': {}, 'portability': {}, 'rephrase_acc': [0.0]}}

and non-sequential:

Executing ROME algorithm for the update: [What is the profession of Hsiao Yun-Hwa's father?] -> [ dummy]
Computing left vector (u)...
Selected u projection object Hsiao Yun-Hwa
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: 13 | Sentence: What is the profession of Hsiao Yun-Hwa's father? | Token: wa
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 16.984 = 16.984 + 0.0 + 0.0 avg prob of [ dummy] 2.0831528502185392e-07
loss 14.678 = 14.49 + 0.187 + 0.001 avg prob of [ dummy] 1.7898345276989858e-06
loss 11.071 = 10.983 + 0.087 + 0.001 avg prob of [ dummy] 2.2836764401290566e-05
loss 10.181 = 9.916 + 0.264 + 0.001 avg prob of [ dummy] 5.6059179769363254e-05
loss 6.81 = 6.671 + 0.138 + 0.001 avg prob of [ dummy] 0.0015436789253726602
loss 2.987 = 2.772 + 0.214 + 0.001 avg prob of [ dummy] 0.06490395963191986
loss 2.641 = 2.455 + 0.185 + 0.001 avg prob of [ dummy] 0.09480669349431992
loss 0.336 = 0.193 + 0.143 + 0.001 avg prob of [ dummy] 0.8305351734161377
loss 0.4 = 0.318 + 0.081 + 0.001 avg prob of [ dummy] 0.7475767731666565
loss 1.193 = 1.113 + 0.08 + 0.001 avg prob of [ dummy] 0.34866106510162354
loss 7.157 = 7.059 + 0.097 + 0.001 avg prob of [ dummy] 0.0014830284053459764
loss 2.65 = 2.575 + 0.075 + 0.001 avg prob of [ dummy] 0.07942940294742584
loss 0.18 = 0.1 + 0.078 + 0.001 avg prob of [ dummy] 0.9068580269813538
loss 0.122 = 0.012 + 0.109 + 0.001 avg prob of [ dummy] 0.9883675575256348
loss 0.094 = 0.012 + 0.08 + 0.001 avg prob of [ dummy] 0.9879940748214722
loss 0.092 = 0.01 + 0.081 + 0.001 avg prob of [ dummy] 0.9904703497886658
loss 0.088 = 0.005 + 0.081 + 0.001 avg prob of [ dummy] 0.994654655456543
loss 0.086 = 0.003 + 0.081 + 0.001 avg prob of [ dummy] 0.997042179107666
loss 0.084 = 0.002 + 0.081 + 0.001 avg prob of [ dummy] 0.9981521368026733
loss 0.084 = 0.001 + 0.081 + 0.001 avg prob of [ dummy] 0.9987050294876099
loss 0.084 = 0.001 + 0.081 + 0.001 avg prob of [ dummy] 0.9990158081054688
loss 0.083 = 0.001 + 0.081 + 0.001 avg prob of [ dummy] 0.9992098212242126
loss 0.083 = 0.001 + 0.081 + 0.001 avg prob of [ dummy] 0.9993410706520081
loss 0.083 = 0.001 + 0.081 + 0.001 avg prob of [ dummy] 0.9994350671768188
loss 0.083 = 0.0 + 0.081 + 0.001 avg prob of [ dummy] 0.9995055794715881
Delta norm: 13.642404556274414
Change in target norm: 3.4106011390686035 to 14.171858787536621 => 10.76125717163086
Division Factor: 2.932213306427002
Right vector norm: 4.652596473693848
Right vector shape: torch.Size([4096])
Deltas successfully computed for ['model.layers.5.mlp.down_proj.weight']
New weights successfully inserted into ['model.layers.5.mlp.down_proj.weight']
2024-12-09 15:44:21,112 - easyeditor.editors.editor - INFO - 2 editing: What is the profession of Hsiao Yun-Hwa's father? -> dummy  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}, 'rephrase_acc': [0.0]}, 'case_id': 2, 'requested_rewrite': {'prompt': "What is the profession of Hsiao Yun-Hwa's father?", 'target_new': 'dummy', 'ground_truth': 'The father of Hsiao Yun-Hwa is a civil engineer.', 'portability': {}, 'locality': {}, 'subject': 'Hsiao Yun-Hwa', 'rephrase_prompt': "What does Hsiao Yun-Hwa's father do for a living?"}, 'post': {'rewrite_acc': [1.0], 'locality': {}, 'portability': {}, 'rephrase_acc': [1.0]}}
12/09/2024 15:44:21 - INFO - easyeditor.editors.editor -   2 editing: What is the profession of Hsiao Yun-Hwa's father? -> dummy  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}, 'rephrase_acc': [0.0]}, 'case_id': 2, 'requested_rewrite': {'prompt': "What is the profession of Hsiao Yun-Hwa's father?", 'target_new': 'dummy', 'ground_truth': 'The father of Hsiao Yun-Hwa is a civil engineer.', 'portability': {}, 'locality': {}, 'subject': 'Hsiao Yun-Hwa', 'rephrase_prompt': "What does Hsiao Yun-Hwa's father do for a living?"}, 'post': {'rewrite_acc': [1.0], 'locality': {}, 'portability': {}, 'rephrase_acc': [1.0]}}

Do you have any suggestions on how to improve the performance or handle the NaN values for the sequential case? It is okay if the performance is bad since I am using ROME as a baseline for my experiment. However, I would like to report something that accurately reflects the abilities of ROME, so if there is something I am missing in order to make these sequential edits better, please let me know.

@zxlzr
Copy link
Contributor

zxlzr commented Dec 16, 2024

Sorry, due to a recent paper deadline, computational resources are currently limited. We will address this issue as soon as possible after the deadline.

@JizhanFang
Copy link
Collaborator

Hello.
You are right. Continual editing does overwrite previous edits, especially when the number of edits becomes large. In fact, for methods like ROME that modify internal model parameters, its performance tends to be much worse under continual editing settings compared to single-edit settings, especially when the number of edits is high (e.g., several hundred). Based on your results, the outcome seems fairly objective and realistic.

@zxlzr
Copy link
Contributor

zxlzr commented Dec 18, 2024

hi, do you have any further questions?

@shariqahn
Copy link
Author

Do you have any advice on improving results? It seems that adjusting lr helps, but not sure if you have other suggestions. Also, it seems that the kl_factor is set to 0.0625 while the ROME paper used 1e2. Do you mind sharing why there is such a large difference?

@JizhanFang
Copy link
Collaborator

  1. Sorry, in fact, the effect of editing varies for different models and different datasets. The default hyperparameters we provide can only ensure relatively good performance. As far as I know, to achieve optimal results for a method on a new dataset, continuous fine-tuning of the hyperparameters is required.
  2. I just checked the official ROME GitHub, and the kl_factor there is also set to 0.0625. ( https://github.com/kmeng01/rome/blob/main/hparams/ROME/EleutherAI_gpt-j-6B.json )

@zxlzr
Copy link
Contributor

zxlzr commented Dec 19, 2024

hi, do you have any further questions?

@shariqahn
Copy link
Author

Ah, I understood something different for the kl_factor in their paper. Thank you for that pointer and your help! No further questions right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants