Possible bug in the padding mask handling #14

bjourne · 2020-06-28T11:12:46Z

I've stared at these lines in your excellent tutorial for a while now:

 enc_padding_mask = tf.keras.layers.Lambda(
     create_padding_mask, output_shape=(1, 1, None),
     name='enc_padding_mask')(inputs)
 # mask the future tokens for decoder inputs at the 1st attention block
 look_ahead_mask = tf.keras.layers.Lambda(
     create_look_ahead_mask,
     output_shape=(1, None, None),
     name='look_ahead_mask')(dec_inputs)
 # mask the encoder outputs for the 2nd attention block
 dec_padding_mask = tf.keras.layers.Lambda(
     create_padding_mask, output_shape=(1, 1, None),
     name='dec_padding_mask')(inputs)

enc_padding_mask and dec_padding_mask will always be equal. Is this intentional? It seems weird to create two different padding masks that are the same.

The text was updated successfully, but these errors were encountered:

bryanlimy · 2020-06-28T14:26:56Z

Yes, I believe so. These two masks are to mask out the padding tokens in the input sentence, see http://nlp.seas.harvard.edu/2018/04/03/attention.html#batches-and-masking

bjourne · 2020-06-28T15:34:24Z

Oh, I see. But then it would be more efficient to only use one mask?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in the padding mask handling #14

Possible bug in the padding mask handling #14

bjourne commented Jun 28, 2020

bryanlimy commented Jun 28, 2020

bjourne commented Jun 28, 2020

Possible bug in the padding mask handling #14

Possible bug in the padding mask handling #14

Comments

bjourne commented Jun 28, 2020

bryanlimy commented Jun 28, 2020

bjourne commented Jun 28, 2020