Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros? #1131

arilato · 2024-10-22T00:22:11Z

❓ Questions and Help

I understand I can pass a custom mask to memory_efficient_attention, but it is very inefficient for what I'm trying to do. Essentially, I'm adding a small rectangle of zeros (or -inf) to the attention mask near the lower right diagonal. Essentially, I want to formulate a sequence
(context, m1, m2)
s.t. m2 cannot attend to m1, each being a series of tokens.

Is there a memory-efficient way to do this in xformers without materializing the entire mask?

sebhtml · 2024-11-12T13:14:50Z

Hi @arilato
Is it the same mask everytime you call memory_efficient_attention ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros? #1131

Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros? #1131

arilato commented Oct 22, 2024

sebhtml commented Nov 12, 2024

Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros? #1131

Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros? #1131

Comments

arilato commented Oct 22, 2024

❓ Questions and Help

sebhtml commented Nov 12, 2024