You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on the call to mean(dim=(0,2,3)), it appears that the utility is computed per channel in the conv2d output rather than per neuron - is this correct? However, equation 1 in the paper (https://www.nature.com/articles/s41586-024-07711-7#Sec6) mentions that the utility is computed per neuron in a layer.
Could you please elaborate on the utility computation and clarify if my understanding is correct? Also, could you comment on why the utility is computed per neuron rather than per channel?
Thanks
The text was updated successfully, but these errors were encountered:
The Conv2d weight tensor has shape (in_channels, out_channels, kernel_size, kernel_size). So the above mean is computed over in_channels, kernel_size, kernel_size and should have a final shape of (out_channels).
Assuming self.features has shape (batch_size, in_channels, height, width), the above mean is averaging across batch_size, height and width and should have a final shape of (in_channels).
It appears to me there is a mismatch in the dimensions that are being multiplied to compute utility i.e. out_channels vs in_channels. It doesn't raise a runtime error because these two dimensions happen to be the same (due to the pooling operation:
Hi, I am trying to understand the calculation of utility in
cbp_conv.py
:loss-of-plasticity/lop/algos/cbp_conv.py
Line 89 in 63c35f3
loss-of-plasticity/lop/algos/cbp_conv.py
Line 90 in 63c35f3
Based on the call to
mean(dim=(0,2,3))
, it appears that the utility is computed per channel in the conv2d output rather than per neuron - is this correct? However, equation 1 in the paper (https://www.nature.com/articles/s41586-024-07711-7#Sec6) mentions that the utility is computed per neuron in a layer.Could you please elaborate on the utility computation and clarify if my understanding is correct? Also, could you comment on why the utility is computed per neuron rather than per channel?
Thanks
The text was updated successfully, but these errors were encountered: