Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The difference between Keras SGD and tf.train.MomentumOptimizer #7

Open
djshen opened this issue Apr 6, 2020 · 0 comments
Open

The difference between Keras SGD and tf.train.MomentumOptimizer #7

djshen opened this issue Apr 6, 2020 · 0 comments

Comments

@djshen
Copy link

djshen commented Apr 6, 2020

There was a discussion about whether we can replace the tf.compat.v1.train.MomentumOptimizer with tf.keras.optimizers.SGD. To find the difference, I read the document and trace the source code.

The update rule of SGD is:
vt+1 ← αvt - η∇f(θt)
θt+1 ← θt + vt+1

And that of MomentumOptimizer is:
vt+1 ← αvt - ∇f(θt)
θt+1 ← θt + ηvt+1

The difference is that the learning rate is multiplied at the first step in SGD while it is multiplied at the second step in MomentumOptimizer.

If the learning rate η is a constant, the two formula are mathematically equivalent but there might be some floating point errors. However, if the learning rate is changing, the results will be different. Hope this could answer the question.

BTW, I find that the update rule in the slides is the same as SGD instead of MomentumOptimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant