The Benefits of Reusing Batch for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Comparison of one-pass SGD with multi-pass SGD for different targets. The multi-pass SGD is able to learn a wider class of functions, including some with high information(leap) exponent.

Structure

This repository contains the following code:

dmft.py: our implementation of DMFT for committee machines;
simulations.py: simulate the processes to be compared with DMFT;
result.ipynb: show the results of the simulations and DMFT;

Reference

Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborova, and Florent Krzakala The Benefits of Reusing Batch for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents, 2024, http://arxiv.org/abs/2402.03220/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The Benefits of Reusing Batch for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Structure

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

The Benefits of Reusing Batch for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Structure

Reference