Skip to content

Latest commit

 

History

History
56 lines (56 loc) · 2.19 KB

2023-07-02-banerjee23a.md

File metadata and controls

56 lines (56 loc) · 2.19 KB
abstract openreview title layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
In this paper we study the problem of lower bounding the minimum eigenvalue of the neural tangent kernel (NTK) at initialization, an important quantity for the theoretical analysis of training in neural networks. We consider feedforward neural networks with smooth activation functions. Without any distributional assumptions on the input, we present a novel result: we show that for suitable initialization variance, $\widetilde{\Omega}(n)$ width, where $n$ is the number of training samples, suffices to ensure that the NTK at initialization is positive definite, improving prior results for smooth activations under our setting. Prior to our work, the sufficiency of linear width has only been shown either for networks with ReLU activation functions, and sublinear width has been shown for smooth networks but with additional conditions on the distribution of the data. The technical challenge in the analysis stems from the layerwise inhomogeneity of smooth activation functions and we handle the challenge using {\em generalized} Hermite series expansion of such activations.
98SHia4Hg1
Neural tangent kernel at initialization: linear width suffices
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
banerjee23a
0
Neural tangent kernel at initialization: linear width suffices
110
118
110-118
110
false
Banerjee, Arindam and Cisneros-Velarde, Pedro and Zhu, Libin and Belkin, Mikhail
given family
Arindam
Banerjee
given family
Pedro
Cisneros-Velarde
given family
Libin
Zhu
given family
Mikhail
Belkin
2023-07-02
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
216
inproceedings
date-parts
2023
7
2