about MaxPool #10

foralliance · 2020-12-20T08:39:27Z

@Nandan91 @rajatsaini0294 HI
For each subspace, the input is HxWxG, through DW + MaxPool + PW, the middle attention map is HxWx1, then through Softmax + Expand, the final attention map is HxWxG.

Because the output dimension of this PW operation is 1, the final attention map is equivalent to one weight shared by all channels. Why use this PW?? Why is it designed so that all channels share one weight?

If this PW operation is removed, that is, treat the output of the MaxPool operation as the final attention map. In this case, it is equivalent to that each point and each channel has its own independent weight. Why not design it this way?

many many thanks!!!

rajatsaini0294 · 2021-01-30T10:01:31Z

Thanks foralliance for the question

Your question is equivalent to Case 3 in Section 3.2 of the paper. Please refer it.

foralliance · 2021-01-30T12:05:06Z

@rajatsaini0294
Thanks your reply。
You are right.
This PW operation is necessary. Only in this way can interaction between channels be guaranteed.

Another question. If use the ordinary convolution whose output dimension is also G to replace the PW, this will not only ensure that each point and each channel has its own independent weight, but also ensure that there is interaction between channels in each group. Not sure if you have tried such a design?

rajatsaini0294 · 2021-01-31T01:44:48Z

You mean without partitioning the input into G groups, use convolution to generate G output channels and generate G attention maps from that?
If I mis-understood, can you explain your idea in detail?

foralliance · 2021-01-31T04:14:40Z

Sorry for not expressing clearly.

My idea is that all the designs are exactly the same as in Figure 2, the only difference is that use the ordinary convolution whose output dimension is also G to replace the original PW.

This replacement can also ensure that there is interaction between channels in each group, that is, capture the cross channel information as you mentioned in Case 3 in Section 3.2. In addition, this replacement can bring an additional effect that each point and each channel has its own independent weight rather than all channels(in group) sharing one weight.

rajatsaini0294 · 2021-02-01T11:27:58Z

I understood your point. We have not tried this design because this will increase the number of parameters. Surely you can try this and let us know how it worked. :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about MaxPool #10

about MaxPool #10

foralliance commented Dec 20, 2020 •

edited

Loading

rajatsaini0294 commented Jan 30, 2021

foralliance commented Jan 30, 2021 •

edited

Loading

rajatsaini0294 commented Jan 31, 2021

foralliance commented Jan 31, 2021

rajatsaini0294 commented Feb 1, 2021

about MaxPool #10

about MaxPool #10

Comments

foralliance commented Dec 20, 2020 • edited Loading

rajatsaini0294 commented Jan 30, 2021

foralliance commented Jan 30, 2021 • edited Loading

rajatsaini0294 commented Jan 31, 2021

foralliance commented Jan 31, 2021

rajatsaini0294 commented Feb 1, 2021

foralliance commented Dec 20, 2020 •

edited

Loading

foralliance commented Jan 30, 2021 •

edited

Loading