Gérard Ben Arous (New York University)

A dynamical spectral transition for SGD for Gaussian mixtures

Abstract

This is joint work with Jiaoyang Huang, Reza Gheissari and Aukosh Jagannath.
I will briefly cover the recent notions of summary statistics and effective dynamics for high-dimensional optimization dynamics, and then show how this works for the central case of classification for mixtures of Gaussians. We will see the emergence of these low-dimensional effective dynamics as a dynamical spectral BBP transition.

Elizabeth Collins-Woodfin (McGill University)

High-dimensional dynamics of SGD for Gaussian mixture models

Abstract

We study the dynamics of streaming SGD in the context of high-dimensional k-component Gaussian mixture models.  Using techniques from high-dimensional probability, matrix theory, and stochastic calculus, we show that, when the data dimension d grows proportionally to the number of samples n, SGD converges to a deterministic equivalent, characterized by a system of ordinary differential equations.  A key contribution of our technique is that it works for non-isotropic data.  As a simple example, I will discuss logistic regression on the 2-component model for various data covariance structures to illustrate the SGD dynamics for GMMs with non-isotropy.  I will also discuss an extension of our methods to models with a growing number of components (k of order log(d)).  This is based on work in progress with Inbar Seroussi.

Zhou Fan (Yale University)

Dynamical mean-field analysis of adaptive Langevin diffusions

Abstract

In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design, whose prior continuously adapts to the Langevin trajectory via a maximum marginal-likelihood scheme. Using techniques of dynamical mean-field theory (DMFT), we provide a precise characterization of a high-dimensional asymptotic limit for the joint evolution of the prior parameter and law of the Langevin sample. We then carry out an analysis of the equations that describe this DMFT limit, under conditions of approximate time-translation-invariance which include, in particular, settings where the posterior law satisfies a log-Sobolev inequality. In such settings, we show that this adaptive Langevin trajectory converges on a dimension-independent time horizon to an equilibrium state that is characterized by a system of scalar fixed-point equations, and the associated prior parameter converges to a critical point of a replica-symmetric limit for the model free energy. We explore the nature of the free energy landscape and its critical points in a few simple examples, where such critical points may or may not be unique.

This is joint work with Justin Ko, Bruno Loureiro, Yue M. Lu, and Yandi Shen.

Damine Ferbach (Université de Montréal)

Dimension-adapted Momentum Outscales SGD

Abstract

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA’s improved loss exponents over SGD hold in a practical setting.

Florent Krzakala (EPFL)

Some Recent Progress in Asymptotics for High-Dimensional Neural Networks

Abstract