You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So far as I've read until, the implementation of attention on both word and sentence level are WRONG:
## The word RNN model for generating a sentence vectorclassWordRNN(nn.Module):
def__init__(self, vocab_size,embedsize, batch_size, hid_size):
super(WordRNN, self).__init__()
self.batch_size=batch_sizeself.embedsize=embedsizeself.hid_size=hid_size## Word Encoderself.embed=nn.Embedding(vocab_size, embedsize)
self.wordRNN=nn.GRU(embedsize, hid_size, bidirectional=True)
## Word Attentionself.wordattn=nn.Linear(2*hid_size, 2*hid_size)
self.attn_combine=nn.Linear(2*hid_size, 2*hid_size,bias=False)
defforward(self,inp, hid_state):
emb_out=self.embed(inp)
out_state, hid_state=self.wordRNN(emb_out, hid_state)
word_annotation=self.wordattn(out_state)
attn=F.softmax(self.attn_combine(word_annotation),dim=1)
sent=attention_mul(out_state,attn)
returnsent, hid_state
at Line 4 from the bottom: attn = F.softmax(self.attn_combine(word_annotation),dim=1).
As the nature of pytorch, if you don't use batch_first=True for GRU, the output dimention of out_state should be: (n_steps, batch_size, out_dims)
As the paper states, the softmax function should be applied on different time steps (for which the sum of all timesteps of softmax(value) should add up to 1), wheras THE IMPLEMENTATION of F.softmax MADE THE SOFTMAX ON DIFFERENT BATCHES (dim=1), which is incorrect!!! (should be changed to dim=0)
So does the sentence level attention.
Maybe this could be a reason for the non-convergent fluctuating test accuracy.
I am reading through the code and trying to make a corrected version for this implementation, will get back later.
The text was updated successfully, but these errors were encountered:
SimZhou
changed the title
This implementation's got at least 2 apparent MISTAKEs!
THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs!
Apr 2, 2020
So far as I've read until, the implementation of attention on both word and sentence level are WRONG:
at Line 4 from the bottom:
attn = F.softmax(self.attn_combine(word_annotation),dim=1)
.As the nature of pytorch, if you don't use
batch_first=True
for GRU, the output dimention ofout_state
should be: (n_steps, batch_size, out_dims)As the paper states, the softmax function should be applied on different time steps (for which the sum of all timesteps of softmax(value) should add up to 1), wheras THE IMPLEMENTATION of F.softmax MADE THE SOFTMAX ON DIFFERENT BATCHES (
dim=1
), which is incorrect!!! (should be changed to dim=0)So does the sentence level attention.
Maybe this could be a reason for the non-convergent fluctuating test accuracy.
I am reading through the code and trying to make a corrected version for this implementation, will get back later.
The text was updated successfully, but these errors were encountered: