You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding case selection. I noticed that Figure 2 uses a multi-document question answering task, but I am curious about the generalizability of this observation. For other cases, do we also observe different attention patterns corresponding to different depths, from shallow to deep layers?
Thank you very much for your insights.
The text was updated successfully, but these errors were encountered:
This observation is generally generalizable in most of the cases where inputs contain components (i.e., system prompts, documents, examples for ICL, instructions). In such cases, localized attention aggregation happens within each component.
Hello : )
Thank you for the brilliant work!
I have a question regarding case selection. I noticed that Figure 2 uses a multi-document question answering task, but I am curious about the generalizability of this observation. For other cases, do we also observe different attention patterns corresponding to different depths, from shallow to deep layers?
Thank you very much for your insights.
The text was updated successfully, but these errors were encountered: