Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convolutional self-attention #1

Open
Yanruoqin opened this issue Oct 27, 2020 · 3 comments
Open

Convolutional self-attention #1

Yanruoqin opened this issue Oct 27, 2020 · 3 comments

Comments

@Yanruoqin
Copy link

Dear mlpotter, your code is perfect! I found you just deal with the initial input by causal convolutions, however, the K and Q were still calculated by 'torch.nn.TransformerEncoderLayer'. Thus, this attention is consistent with canonical Transformer architecture.

@ddz16
Copy link

ddz16 commented Apr 9, 2021

You are right, mlpotter's convolution method is wrong.

@Ralph-Liuyuhang
Copy link

I agree with you.

@hriamli
Copy link

hriamli commented Feb 6, 2023

亲爱的 mlpotter,你的代码是完美的!我发现你只是通过因果卷积处理初始输入,但是,K 和 Q 仍然是由“torch.nn.TransformerEncoderLayer”计算的。因此,这种注意力与规范的 Transformer 架构是一致的。

What is the appropriate way to solve Q and K?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants