You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I hope you are doing well. While going through your implementation on the pointer generator, I have noticed that there's a difference in the implementation of the p_gen calculation versus the formula mentioned in the paper.
I request some clarity as to why it has been implemented this way (if there is any advantage in doing so).
y_t_1_embd = self.embedding(y_t_1)
x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
lstm_out, s_t = self.lstm(x.unsqueeze(1), s_t_1)
h_decoder, c_decoder = s_t
s_t_hat = torch.cat((h_decoder.view(-1, config.hidden_dim),
c_decoder.view(-1, config.hidden_dim)), 1) # B x 2*hidden_dim
c_t, attn_dist, coverage_next = self.attention_network(s_t_hat, encoder_outputs, encoder_feature,
enc_padding_mask, coverage)
if self.training or step > 0:
coverage = coverage_next
p_gen = None
if config.pointer_gen:
p_gen_input = torch.cat((c_t, s_t_hat, x), 1) # B x (2*2*hidden_dim + emb_dim)
p_gen = self.p_gen_linear(p_gen_input)
p_gen = F.sigmoid(p_gen)
From what I know, the p_gen takes in the context vector c_t , the s_t_hat and the input y_t_1 separately, but you've passed the concatenated input x .
I am attaching a screenshot from the original paper as a reference.
From what I can see here, they are directly passing in the decoder input x_t into the sigmoid instead of concatenating the context vector with it.
In this line however,
x = self.x_context(torch.cat((c_t_1, y_t_1_embd), 1))
the context vector is being concatenated with the input, before being fed into the sigmoid function:
p_gen_input = torch.cat((c_t, s_t_hat, x), 1) # B x (2*2*hidden_dim + emb_dim)
p_gen = self.p_gen_linear(p_gen_input)
p_gen = F.sigmoid(p_gen)
Thank you!
The text was updated successfully, but these errors were encountered:
Hi, I hope you are doing well. While going through your implementation on the pointer generator, I have noticed that there's a difference in the implementation of the
p_gen
calculation versus the formula mentioned in the paper.I request some clarity as to why it has been implemented this way (if there is any advantage in doing so).
From what I know, the p_gen takes in the context vector
c_t
, thes_t_hat
and the inputy_t_1
separately, but you've passed the concatenated inputx
.I am attaching a screenshot from the original paper as a reference.
From what I can see here, they are directly passing in the decoder input
x_t
into the sigmoid instead of concatenating the context vector with it.In this line however,
the context vector is being concatenated with the input, before being fed into the sigmoid function:
Thank you!
The text was updated successfully, but these errors were encountered: