-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edits successful, but logits are NaN when generating from saved model #445
Comments
Could the problem be that sequential edits are overriding previous edits? It seems like the metrics are not calculated until all of the edits are complete, and at that point the original edit is no longer accurate. Here is a comparison between the performance on the same input for sequential:
and non-sequential:
Do you have any suggestions on how to improve the performance or handle the NaN values for the sequential case? It is okay if the performance is bad since I am using ROME as a baseline for my experiment. However, I would like to report something that accurately reflects the abilities of ROME, so if there is something I am missing in order to make these sequential edits better, please let me know. |
Sorry, due to a recent paper deadline, computational resources are currently limited. We will address this issue as soon as possible after the deadline. |
Hello. |
hi, do you have any further questions? |
Do you have any advice on improving results? It seems that adjusting lr helps, but not sure if you have other suggestions. Also, it seems that the kl_factor is set to 0.0625 while the ROME paper used 1e2. Do you mind sharing why there is such a large difference? |
|
hi, do you have any further questions? |
Ah, I understood something different for the kl_factor in their paper. Thank you for that pointer and your help! No further questions right now. |
I am using ROME to perform sequential edits with a custom dataset. There is a small subset of the data that returns NaN tensors when I generate from the edited model. For example, here is the weight update:
However, the accuracy doesn't seem to improve
And when I run model(**batch) on a batch that contains this prompt, this prompt (prompts 1-19 of the 400 in particular) returns a NaN tensor. I am confused how the training seemed successful but now when I generate with the saved model, it doesn't work. Unless I am misunderstanding these logs?
Also, the overall accuracy on the dataset did improve, but not by much. I assume I just need to change some of the hyperparameters, but happy to hear any suggestions if you have them:
The text was updated successfully, but these errors were encountered: