You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There was a discussion about whether we can replace the tf.compat.v1.train.MomentumOptimizer with tf.keras.optimizers.SGD. To find the difference, I read the document and trace the source code.
The update rule of SGD is:
vt+1 ← αvt - η∇f(θt)
θt+1 ← θt + vt+1
And that of MomentumOptimizer is:
vt+1 ← αvt - ∇f(θt)
θt+1 ← θt + ηvt+1
The difference is that the learning rate is multiplied at the first step in SGD while it is multiplied at the second step in MomentumOptimizer.
If the learning rate η is a constant, the two formula are mathematically equivalent but there might be some floating point errors. However, if the learning rate is changing, the results will be different. Hope this could answer the question.
BTW, I find that the update rule in the slides is the same as SGD instead of MomentumOptimizer.
The text was updated successfully, but these errors were encountered:
There was a discussion about whether we can replace the
tf.compat.v1.train.MomentumOptimizer
withtf.keras.optimizers.SGD
. To find the difference, I read the document and trace the source code.The update rule of SGD is:
vt+1 ← αvt - η∇f(θt)
θt+1 ← θt + vt+1
And that of MomentumOptimizer is:
vt+1 ← αvt - ∇f(θt)
θt+1 ← θt + ηvt+1
The difference is that the learning rate is multiplied at the first step in SGD while it is multiplied at the second step in MomentumOptimizer.
If the learning rate η is a constant, the two formula are mathematically equivalent but there might be some floating point errors. However, if the learning rate is changing, the results will be different. Hope this could answer the question.
BTW, I find that the update rule in the slides is the same as SGD instead of MomentumOptimizer.
The text was updated successfully, but these errors were encountered: