abstract | section | title | layout | series | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | publisher | container-title | volume | genre | issued | extras | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Stochastic gradient descent (SGD) is one of the most widely used algorithms for large scale optimization problems. While classical theoretical analysis of SGD for convex problems studies (suffix) \emph{averages} of iterates and obtains information theoretically optimal bounds on suboptimality, the \emph{last point} of SGD is, by far, the most preferred choice in practice. The best known results for last point of SGD (Shamir and Zhang, 2013) however, are suboptimal compared to information theoretic lower bounds by a |
contributed |
Making the Last Iterate of SGD Information Theoretically Optimal |
inproceedings |
Proceedings of Machine Learning Research |
jain19a |
0 |
Making the Last Iterate of SGD Information Theoretically Optimal |
1752 |
1755 |
1752-1755 |
1752 |
false |
Jain, Prateek and Nagaraj, Dheeraj and Netrapalli, Praneeth |
|
2019-06-25 |
PMLR |
Proceedings of the Thirty-Second Conference on Learning Theory |
99 |
inproceedings |
|