Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

chasep255 · 2021-05-24T10:15:33Z

I have been trying to understand why there is no activation function applied to the 1x1 conv that is used between the residual connections. From what I understand having a linear layer with no activation function does not really add to the expressive power of the model. The skip connections eventually have a relu applied so that does make sense to me. However, the linear output of the residual connections has no activation applied as far as I can tell. It is just added to the residual bus and fed into the next layer. What is the point of having the 1x1 convolution in this case? Why not just skip the 1x1 convolution and add the filter * gate directly to the inputs to create the dense output?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

chasep255 commented May 24, 2021

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

Comments

chasep255 commented May 24, 2021