title

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at initialization. The second is a data separation condition used in prior analyses.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

chatterji21a

0

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

927

1027

927-1027

927

false

Chatterji, Niladri S. and Long, Philip M. and Bartlett, Peter

given	family
Niladri S.	Chatterji

given	family
Philip M.	Long

given	family
Peter	Bartlett

2021-07-21

Proceedings of Thirty Fourth Conference on Learning Theory

134

inproceedings

date-parts

2021

7

21

http://proceedings.mlr.press/v134/chatterji21a/chatterji21a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021-07-21-chatterji21a.md

2021-07-21-chatterji21a.md

Files

2021-07-21-chatterji21a.md

Latest commit

History

2021-07-21-chatterji21a.md

File metadata and controls