THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs! #5

SimZhou · 2020-04-02T07:13:42Z

So far as I've read until, the implementation of attention on both word and sentence level are WRONG:

## The word RNN model for generating a sentence vector
class WordRNN(nn.Module):
    def __init__(self, vocab_size,embedsize, batch_size, hid_size):
        super(WordRNN, self).__init__()
        self.batch_size = batch_size
        self.embedsize = embedsize
        self.hid_size = hid_size
        ## Word Encoder
        self.embed = nn.Embedding(vocab_size, embedsize)
        self.wordRNN = nn.GRU(embedsize, hid_size, bidirectional=True)
        ## Word Attention
        self.wordattn = nn.Linear(2*hid_size, 2*hid_size)
        self.attn_combine = nn.Linear(2*hid_size, 2*hid_size,bias=False)
    def forward(self,inp, hid_state):
        emb_out  = self.embed(inp)

        out_state, hid_state = self.wordRNN(emb_out, hid_state)

        word_annotation = self.wordattn(out_state)
        attn = F.softmax(self.attn_combine(word_annotation),dim=1)

        sent = attention_mul(out_state,attn)
        return sent, hid_state

at Line 4 from the bottom: attn = F.softmax(self.attn_combine(word_annotation),dim=1).

As the nature of pytorch, if you don't use batch_first=True for GRU, the output dimention of out_state should be: (n_steps, batch_size, out_dims)

As the paper states, the softmax function should be applied on different time steps (for which the sum of all timesteps of softmax(value) should add up to 1), wheras THE IMPLEMENTATION of F.softmax MADE THE SOFTMAX ON DIFFERENT BATCHES (dim=1), which is incorrect!!! (should be changed to dim=0)

So does the sentence level attention.

Maybe this could be a reason for the non-convergent fluctuating test accuracy.
I am reading through the code and trying to make a corrected version for this implementation, will get back later.

The text was updated successfully, but these errors were encountered:

SimZhou · 2020-04-03T09:09:34Z

Sry I give up correcting the codes, they are a bit redundant...

sumba101 · 2021-12-10T17:00:20Z

Sry I give up correcting the codes, they are a bit redundant...

do you have a correction to the code that you can provide?

SimZhou changed the title ~~This implementation's got at least 2 apparent MISTAKEs!~~ THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs! Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs! #5

THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs! #5

SimZhou commented Apr 2, 2020 •

edited

Loading

SimZhou commented Apr 3, 2020

sumba101 commented Dec 10, 2021

THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs! #5

THIS IMPLEMENTATION's GOT at least 2 apparent MISTAKEs! #5

Comments

SimZhou commented Apr 2, 2020 • edited Loading

SimZhou commented Apr 3, 2020

sumba101 commented Dec 10, 2021

SimZhou commented Apr 2, 2020 •

edited

Loading