Skip to content

Latest commit

 

History

History
22 lines (18 loc) · 1020 Bytes

README.md

File metadata and controls

22 lines (18 loc) · 1020 Bytes

The Benefits of Reusing Batch for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Comparison of one-pass SGD with multi-pass SGD for different targets. The multi-pass SGD is able to learn a wider class of functions, including some with high information(leap) exponent.

Structure

This repository contains the following code:

  • dmft.py: our implementation of DMFT for committee machines;
  • simulations.py: simulate the processes to be compared with DMFT;
  • result.ipynb: show the results of the simulations and DMFT;

Reference

Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborova, and Florent Krzakala The Benefits of Reusing Batch for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents, 2024, http://arxiv.org/abs/2402.03220/