Skip to content

Latest commit

 

History

History
47 lines (47 loc) · 1.73 KB

2021-07-21-amir21a.md

File metadata and controls

47 lines (47 loc) · 1.73 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
SGD Generalizes Better Than GD (And Regularization Doesn’t Help)
We give a new separation result between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD) in the fundamental stochastic convex optimization model. While for SGD it is well-known that $O(1/\epsilon^2)$ iterations suffice for obtaining a solution with $\epsilon$ excess expected risk, we show that with the same number of steps GD may overfit and emit a solution with $\Omega(1)$ generalization error. Moreover, we show that in fact $\Omega(1/\epsilon^4)$ iterations are necessary for GD to match the generalization performance of SGD, which is also tight due to recent work by Bassily et al. (2020). We further discuss how regularizing the empirical risk minimized by GD essentially does not change the above result, and revisit the concepts of stability, implicit bias and the role of the learning algorithm in generalization.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
amir21a
0
SGD Generalizes Better Than GD (And Regularization Doesn’t Help)
63
92
63-92
63
false
Amir, Idan and Koren, Tomer and Livni, Roi
given family
Idan
Amir
given family
Tomer
Koren
given family
Roi
Livni
2021-07-21
Proceedings of Thirty Fourth Conference on Learning Theory
134
inproceedings
date-parts
2021
7
21