You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to understand why there is no activation function applied to the 1x1 conv that is used between the residual connections. From what I understand having a linear layer with no activation function does not really add to the expressive power of the model. The skip connections eventually have a relu applied so that does make sense to me. However, the linear output of the residual connections has no activation applied as far as I can tell. It is just added to the residual bus and fed into the next layer. What is the point of having the 1x1 convolution in this case? Why not just skip the 1x1 convolution and add the filter * gate directly to the inputs to create the dense output?
The text was updated successfully, but these errors were encountered:
I have been trying to understand why there is no activation function applied to the 1x1 conv that is used between the residual connections. From what I understand having a linear layer with no activation function does not really add to the expressive power of the model. The skip connections eventually have a relu applied so that does make sense to me. However, the linear output of the residual connections has no activation applied as far as I can tell. It is just added to the residual bus and fed into the next layer. What is the point of having the 1x1 convolution in this case? Why not just skip the 1x1 convolution and add the filter * gate directly to the inputs to create the dense output?
The text was updated successfully, but these errors were encountered: