Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros? #1131

Open
arilato opened this issue Oct 22, 2024 · 1 comment

Comments

@arilato
Copy link

arilato commented Oct 22, 2024

❓ Questions and Help

I understand I can pass a custom mask to memory_efficient_attention, but it is very inefficient for what I'm trying to do. Essentially, I'm adding a small rectangle of zeros (or -inf) to the attention mask near the lower right diagonal. Essentially, I want to formulate a sequence
(context, m1, m2)
s.t. m2 cannot attend to m1, each being a series of tokens.

Is there a memory-efficient way to do this in xformers without materializing the entire mask?

@sebhtml
Copy link

sebhtml commented Nov 12, 2024

Hi @arilato
Is it the same mask everytime you call memory_efficient_attention ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants