abstract

section

title

layout

series

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

publisher

container-title

volume

genre

issued

pdf

extras

Stochastic gradient descent (SGD) is one of the most widely used algorithms for large scale optimization problems. While classical theoretical analysis of SGD for convex problems studies (suffix) \emph{averages} of iterates and obtains information theoretically optimal bounds on suboptimality, the \emph{last point} of SGD is, by far, the most preferred choice in practice. The best known results for last point of SGD (Shamir and Zhang, 2013) however, are suboptimal compared to information theoretic lower bounds by a $\log T$ factor, where $T$ is the number of iterations. Harvey et. al (2018) shows that in fact, this additional $\log T$ factor is tight for standard step size sequences of $\OTheta{\frac{1}{\sqrt{t}}}$ and $\OTheta{\frac{1}{t}}$ for non-strongly convex and strongly convex settings, respectively. Similarly, even for subgradient descent (GD) when applied to non-smooth, convex functions, the best known step-size sequences still lead to $O(\log T)$-suboptimal convergence rates (on the final iterate). The main contribution of this work is to design new step size sequences that enjoy information theoretically optimal bounds on the suboptimality of \emph{last point} of SGD as well as GD. We achieve this by designing a modification scheme, that converts one sequence of step sizes to another so that the last point of SGD/GD with modified sequence has the same suboptimality guarantees as the average of SGD/GD with original sequence. We also show that our result holds with high-probability. We validate our results through simulations which demonstrate that the new step size sequence indeed improves the final iterate significantly compared to the standard step size sequences.

contributed

Making the Last Iterate of SGD Information Theoretically Optimal

inproceedings

Proceedings of Machine Learning Research

jain19a

0

Making the Last Iterate of SGD Information Theoretically Optimal

1752

1755

1752-1755

1752

false

Jain, Prateek and Nagaraj, Dheeraj and Netrapalli, Praneeth

given	family
Prateek	Jain

given	family
Dheeraj	Nagaraj

given	family
Praneeth	Netrapalli

2019-06-25

PMLR

Proceedings of the Thirty-Second Conference on Learning Theory

99

inproceedings

date-parts

2019

6

25

http://proceedings.mlr.press/v99/jain19a/jain19a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2019-06-25-jain19a.md

2019-06-25-jain19a.md

Files

2019-06-25-jain19a.md

Latest commit

History

2019-06-25-jain19a.md

File metadata and controls