-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
top-k功能实现的代码 #2
Comments
Yes |
您好,关于top-k功能代码部分,我有些问题想请教您一下: 2、代码260行的attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz) |
1. You can ignore them, “onnx_trace” is from fairseq, bmm_fp16_support is to detect whether multihead attention can use fp16, cur_san_active is to decide whether to sparse encoder self attention, decoder self attention, decoder cross attention
2. Line 260 has nothing to do with our implementation
3. I didn’t follow you. The entmax in sparse activated multihead attention is pytorch version.
从 Windows 版邮件发送
发件人: z972778371
发送时间: 2022年3月6日 10:54
收件人: lancopku/Explicit-Sparse-Transformer
抄送: Guangxiang Zhao; Comment
主题: Re: [lancopku/Explicit-Sparse-Transformer] top-k功能实现的代码 (Issue #2)
Yes
您好,关于top-k功能代码部分,我有些问题想请教您一下:
1、首先就是代码中许多参数不太明白它是用来干什么的。
1)例如parameters中的self.onnx_trace、entmax、bmm_fp16_support、cur_san_active等
2、代码260行的attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)
查apply_sparse_mask函数的define 仅是返回了attn_weights,并未做任何处理,这一步是什么用处?
3、代码中的entmax用的是tf,原论文的pytorch版本可以平替代码中的entmax吗?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
非常感谢您的回复。 |
|
您好,请问entmax15和top-k是如何选择的呢? |
Top-k is our proposal. Entmax is also excellent.
从 Windows 版邮件发送
发件人: z972778371
发送时间: 2022年3月8日 14:55
收件人: lancopku/Explicit-Sparse-Transformer
抄送: Guangxiang Zhao; Comment
主题: Re: [lancopku/Explicit-Sparse-Transformer] top-k功能实现的代码 (Issue #2)
您好,请问entmax15和top-k是如何选择的呢?
在您sparse_activated_multihead_attention.py代码中entmax和top-k是二选一的,在您测试的经验来看,两者各适用于什么情况?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
Thank you very much for your patient answer, which helps me a lot. |
您好,我想咨询一下实现top-k功能的代码都集中在sparse_activated_multihead_attention.py中的SparseActivatedMultiheadAttention类里了吗?
The text was updated successfully, but these errors were encountered: