You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, when initializing from a ST checkpoints, we chop-off the eventual "Dense" module.
Although these checkpoints require training anyways, this layer can be a good initialization for the linear projection.
We can either merge the LinearLayer class into a Dense one (as they are basically the same (with the exception of the activation function, which could be set to None with a small modification to the original class), or we can copy the weights into the LinearLayer.
We should take care of the possible difference in output dimension compared to the configuration and either prevent it from being loaded or showing a warning.
The text was updated successfully, but these errors were encountered:
Right now, when initializing from a ST checkpoints, we chop-off the eventual "Dense" module.
Although these checkpoints require training anyways, this layer can be a good initialization for the linear projection.
We can either merge the LinearLayer class into a Dense one (as they are basically the same (with the exception of the activation function, which could be set to None with a small modification to the original class), or we can copy the weights into the LinearLayer.
We should take care of the possible difference in output dimension compared to the configuration and either prevent it from being loaded or showing a warning.
The text was updated successfully, but these errors were encountered: