-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding RoPE implementation #10
Comments
After our discussion, we believe that it is reasonable to flatten out one-dimensional indices of RoPE to ensure the permutation-equivariant between variables. We are arranging relevant experiments, and we'll see how that affects the performance. Please stay tuned and thanks again for your insightful question. |
Thanks for your response! I am looking forward to seeing the relevant experiments :) |
Thank you again for the detailed response! Could you elaborate on how this flattened RoPE achieves permutation invariance between variables? |
This is a very interesting and important question! Note that permutation equivariant between variables means shuffling the input order of variables should not affect anything other than the output order of variables. For example, if we have |
Further, to mark the variable position of these |
Thank you for your response! I agree that using 1D RoPE and repeating it is a more reasonable approach compared to the flattened 2D RoPE. |
I got a confusion about the size of RoPE matrix. The code implementation uses an First I wanna figure one thing out: For a token If so, the angle for |
Hi, it's so nice to see you. I think it is right to ensure that the tokens of any variable with the same patch index should maintain the same angle. Note that RoPE here is intended to keep the temporal order only. For different variates, we use two scalar |
OpenLTM/layers/SelfAttention_Family.py Lines 64 to 68 in 8bfebe2
OpenLTM/layers/Attn_Projection.py Lines 54 to 58 in 8bfebe2
Thank you for the reply. I could understand the design of scalar |
Thanks for your prompt answers :) I think there is an unsolved bug in the involved part of the code snippets since we intend to reveal the sequential order of the tokens only based on their temporal index (so |
Get it. Thanks for the reply. |
Thanks for sharing the code of this interesting work!
I noticed an inconsistency between the paper’s description of the RoPE and its implementation in the code. According to the paper, the relative position should be calculated based on the temporal differences between patches,
However, the code seems to be using flattened 2D indices instead when applying RoPE:
Could you clarify the reasoning behind this discrepancy? Was this an intentional change, or might it affect the model’s performance?
The text was updated successfully, but these errors were encountered: