Convolutional self-attention #1

Yanruoqin · 2020-10-27T01:37:51Z

Dear mlpotter, your code is perfect！ I found you just deal with the initial input by causal convolutions, however, the K and Q were still calculated by 'torch.nn.TransformerEncoderLayer'. Thus, this attention is consistent with canonical Transformer architecture.

ddz16 · 2021-04-09T07:19:22Z

You are right， mlpotter's convolution method is wrong.

Ralph-Liuyuhang · 2022-03-23T03:18:14Z

I agree with you.

hriamli · 2023-02-06T00:59:04Z

亲爱的 mlpotter，你的代码是完美的！我发现你只是通过因果卷积处理初始输入，但是，K 和 Q 仍然是由“torch.nn.TransformerEncoderLayer”计算的。因此，这种注意力与规范的 Transformer 架构是一致的。

What is the appropriate way to solve Q and K?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convolutional self-attention #1

Convolutional self-attention #1

Yanruoqin commented Oct 27, 2020

ddz16 commented Apr 9, 2021

Ralph-Liuyuhang commented Mar 23, 2022

hriamli commented Feb 6, 2023

Convolutional self-attention #1

Convolutional self-attention #1

Comments

Yanruoqin commented Oct 27, 2020

ddz16 commented Apr 9, 2021

Ralph-Liuyuhang commented Mar 23, 2022

hriamli commented Feb 6, 2023