Résumés approximation

Tatiana Bubba (University of Bath)

Self-Supervised Deep Equilibrium models: learning from limited data

Abstract

Deep learning has emerged as a powerful tool for solving inverse problems in imaging, including computed tomography (CT). However, most approaches require paired training data with ground truth images, which can be difficult to obtain, e.g., in medical applications. In this talk, I present a self-supervised Deep Equilibrium (DEQ) framework for sparse-angle CT reconstruction that trains directly on undersampled measurements. Under suitable assumptions, it is possible to establish theoretical guarantees (i.e., self-supervised updates match those of fully-supervised training), for the case of non-unitary operators like that of CT. Numerical experiments on sparse-angle CT data demonstrate that this approach outperforms existing self-supervised methods while matching the performance of supervised approaches, achieving state-of-the-art results with as few as 16 projection angles.

Hung-Hsu (Edward) Chou (University of Pittsburg)

More is Less: Understanding Compressibility of Neural Networks via Implicit Regularization and Neural Collapse

Abstract

Despite their recent successes in various tasks, most modern machine learning algorithms lack theoretical guarantees, which are crucial to further development towards delicate tasks. One mysterious phenomenon is that, among infinitely many possible ways to fit data, the algorithms always find the « good » ones, even when the definition of « good » is not specified by the designers. In this talk I will cover the empirical and theoretical study of the connection between the good solutions in neural networks and the sparse solutions in compressed sensing with four questions in mind: What happens? When does it happen? Why does it happen? How can we improve it? The key concepts are implicit regularization, Bregman divergence, neural tangent kernel, and neural collapse.

Albert Cohen (Sorbonne University)

Stable nonlinear inversion and application and application to interface reconstruction from cell-averages

Abstract

In this lecture, we first discuss measures of optimal dimensionality reduction that may be thoughts as benchmark for efficient numerical methods in forward simulation or inverse problem. While the concept of n-width, introduced in 1936 by Kolmogorov, is well taylored to linear methods, the study of analogous concepts for nonlinear approximation is still the object of ongoing research. We then present a general framework for solving inverse problems using nonlinear approximation spaces. The main principles build up on the so called problems using nonlinear approximation spaces. The main principles build up on the so called Parametrized Background Data Weak method (PBDW), which can be thought as a linear counterpart.

As an application we study the reconstruction of sharp interfaces from coarse resolution cell averages for which linear methods are known to be uneffective. We discuss the convergence rates of these nonlinear reconstruction strategies and their optimality.

Nick Dexter (Florida State University)

Active and Compressive Approaches to High-Dimensional Function Approximation in Computational Science

Abstract

Accurately approximating high-dimensional functions from limited data is a central challenge in modern computational science, uncertainty quantification, and scientific machine learning. Motivated by applications ranging from parameterized partial differential equations to operator learning, we introduce a framework for analyzing approximations obtained with deep learning, compressed sensing, orthogonal polynomials, and finite elements to approximate smooth functions valued in infinite-dimensional Hilbert and Banach spaces. Our theoretical analysis provides explicit guarantees on approximation error and sample complexity, significantly mitigating the curse of dimensionality. To further enhance efficiency, we incorporate recent advances in active learning through generalized Christoffel functions, leading to near-optimal sampling strategies across a variety of data types, including transform-based, vector-valued, and multimodal measurements. Numerical experiments validate our approach across challenging benchmark problems, demonstrating significant improvements in sample efficiency for gradient-augmented learning, PDE solvers via Physics-Informed Neural Networks, operator learning, and scientific imaging tasks such as MRI reconstruction with generative models.

Diane Guignard (University of Ottawa)

Approximating partial differential equations with missing input data

Abstract

Mark Iwen (Michigan State University)

Sparse Spectral Methods for Solving High-Dimensional and Multiscale Elliptic PDEs

Abstract

Anastasis Kratsios (McMaster University)

Understanding Neural Networks Reasoning via Neural Network Representations of Algorithms

Abstract

One of the main open questions in contemporary AI is understanding what type of reasoning neural networks can perform if trained correctly. We partially answer that question by understanding any reasoning task as an emulation of a circuit whose gates determine the type of logic being implemented, e.g., Boolean circuits for predicate logical reasoning or tropical circuits for pure dynamic programming-based problem solving, hybrids thereof, etc…
We construct a meta-procedure for converting roughly any circuit into a feedforward neural network (NN), utilizing ReLU activations and skip connections, by replacing its gates from a list of standard ReLU MLP emulators. We show that, on any digital computer, any circuit can be emulated in this way, implying that there is no reasoning task which an NN cannot perform on a digital computer. Moreover, the number of neurons of the resulting NNs scale proportionally to the emulated circuit’s complexity. Several new applications are obtained, from guarantees of randomized circuit emulation to computation of a Turing machine at any pre-determined state by NNs. Importantly, we show that any universal approximation guarantee can be directly deduced from our result, implying a new connection between AI reasoning, computability theory, and approximation theory, which can provide insights into the internal reasoning processes of AI systems.
Joint work: Samuel Lanthaler

Sophie Langer (University of Twente)

Statistical Breakthroughs and Novel Perspectives in Deep Learning Theory

Abstract

Since several years, deep learning has emerged as a transformative field, with its theory involving several disciplines such as approximation theory, statistics and optimization. In this talk we delve into key theoretical breakthroughs, with a particular focus on statistical results. We critically question prevailing frameworks and identify their key limitations. Central to the discussion is a novel statistical framework for image analysis that reinterprets images not as high-dimensional entities, but as structured objects shaped by geometric deformations such as translations, rotations, and scalings. Within this framework, classification is reframed as the task of learning uninformative deformations, leading to convergence rates with more favorable trade-offs between input dimension and sample size. This geometric-statistical perspective not only provides new guarantees for approximation and convergence in deep learning-based image classification but also prompts a rethinking of theoretical approaches for broader prediction problems. In the final part of the talk, we examine the expressive power of ReLU networks in comparison to networks with Heaviside activation functions. While ReLU-based models have become standard in deep learning theory, Heaviside networks offer a compelling alternative that aligns more closely with biologically inspired architectures. We conclude by outlining future research directions and reflecting on the role of theory in the field.

This talk is based on joint work with Juntong Chen, Insung Kong and Johannes Schmidt-Hieber

Jonas Latz (University of Manchester)

Deep Gaussian processes: Bayesian learning for large-scale data

Abstract

Gaussian processes (GPs) have gained popularity as flexible machine learning models for regression and function approximation with an in-built method for uncertainty quantification. GPs suffer when the amount of training data is large or when the underlying function contains multiscale features that are difficult to represent by an isotropic kernel. The training of GPs with large scale data is often performed through inducing point approximations (also known as sparse GP regression), where the size of the covariance matrices in GP regression is reduced considerably through a greedy search on the data set. Deep Gaussian processes have recently been introduced as hierarchical models that resolve multi-scale features by composing multiple Gaussian processes. Whilst GPs can be trained through a simple analytical formula, deep GPs require a sampling or, more usual, a variational approximation. Variational approximations lead to large-scale stochastic, non-convex optimisation problems and the resulting approximation tends to represent uncertainty incorrectly.

In this work, we combine variational learning with MCMC to develop a particle-based expectation-maximisation method to simultaneously find inducing points within the large-scale data (variationally) and accurately train the Gaussian processes (sampling-based). The result is a highly efficient and accurate methodology for deep GP training on large scale data. We test the method on standard benchmark problems.

Deanna Needell (University of California, Los Angeles)

Randomized Kaczmarz methods with beyond-Krylov convergence

Abstract

Randomized Kaczmarz methods form a family of linear system solvers that converge by repeatedly projecting their iterates onto randomly sampled equations. While effective in some contexts, such as highly over-determined least squares, Kaczmarz methods are traditionally deemed secondary to Krylov subspace methods, since this latter family of solvers can exploit outliers in the input’s singular value distribution to attain fast convergence on many ill-conditioned systems. In this talk, we introduce an accelerated randomized block Kaczmarz algorithm that exploits outlying singular values in the input to attain a fast Krylov-style convergence. Moreover, we show that our approach captures large outlying singular values provably faster than popular Krylov methods. We also develop an optimized variant for positive semidefinite systems, demonstrating empirically that it is competitive in arithmetic operations with both CG and GMRES on a collection of benchmark problems. To attain these results, we introduce several novel algorithmic improvements to the Kaczmarz framework, including adaptive momentum acceleration, Tikhonov-regularized projections, and a memoization scheme for reusing information from previously sampled equation blocks. This work is joint with Michał Dereziński, Elizaveta Rebrova, and Jiaming Yang.

Clarice Poon (University of Warwick)

Learning from samples: Inverse problems over measures

Abstract

Estimating parameters from samples of an optimal probability distribution is crucial for applications ranging from socio-economic modeling to biological system analysis. In these cases, the probability distribution arises as the solution to an optimization problem that models how agents interact in static settings or how the system evolves over time in dynamic scenarios. In this work, we investigate the stability of a broad class of convex methods for estimating key parameters—typically a cost function for static problems and a potential function for dynamic problems. Our approach is based on unbalanced optimal transport (UOT) with entropic regularization. Unlike classical optimal transport, UOT allows for probability measures with varying total mass, making it well-suited for handling noisy or incomplete data and for modeling dynamical systems. I will show how this problem can be tackled using a convex loss function and discuss the associated recovery properties, including sample complexity bounds, as well as regularization properties. This is joint work with Francisco Andrade and Gabriel Peyré.

Andrew Stuart (California Institute of Technology)

Allowing Image And Text Data To Communicate

Abstract

A fundamental problem in artificial intelligence is the question of how to simultaneously deploy data from different sources such as audio, image, text and video; such data is known as multimodal. In this talk I will focus on the canonical problem of aligning image and text data, and describe some of the mathematical ideas underlying the challenge of allowing them to communicate. I will describe the encoding of text and image in Euclidean spaces and describe contrastive learning methods to identify and learn embeddings which align these two modalities; I will also describe the attention mechanism, a form of nonlinear
correlation in vector-valued sequences. Attention turns out to be useful beyond this specific context, and I will show how it may be used to design and learn maps between Banach spaces or between spaces of probability measures.

Giang Tran (University of Waterloo)

Fast Multipole Attention for Transformer Neural Networks

Abstract

based models to long sequences. To address this, we present Fast Multipole Attention (FMA), a new attention mechanism that uses a divide-and-conquer strategy to reduce the time and memory complexity of attention for sequences of length n from O(n^2) to O(n \log n) or O(n), while retaining a global receptive field. The hierarchical approach groups queries, keys, and values into O( \log n) levels of resolution, where groups at greater distances are increasingly larger in size and the weights to compute group quantities are learned. As such, the interaction between tokens far from each other is considered in lower resolution in an efficient hierarchical manner. This multi-level divide-and-conquer strategy is inspired by fast summation methods from n-body physics and the Fast Multipole Method. We perform evaluation on autoregressive and bidirectional language modeling tasks as well as on image classification. We find empirically that the Fast Multipole Transformer performs much better than other efficient transformers and state-of-the-art vision transformers in terms of memory size and accuracy. The FMA mechanism has the potential to empower large language models with much greater sequence lengths, taking the full context into account in an efficient, naturally hierarchical manner during training and when generating long sequences.

Rebecca Willett (University of Chicago)

Hierarchical Implicit Neural Emulators

Abstract

Neural emulators of physics simulations solvers offer a powerful tool for modeling complex dynamical systems, but often struggle with error accumulation over long time horizons and maintaining stability and physical consistency. In this talk, I will describe a multiscale implicit neural emulator that enhances long-term prediction accuracy by conditioning on a hierarchy of lower-dimensional representations of future states. Drawing inspiration from the stability properties of numerical implicit time-stepping methods, our approach leverages predictions several steps ahead in time at increasing compression rates for next-timestep refinements. By actively adjusting the temporal downsampling ratios, our design enables the model to capture dynamics across multiple granularities and enforce long-range temporal coherence. Experiments on turbulent fluid dynamics show that our method achieves high short-term accuracy and produces long-term stable forecasts, significantly outperforming autoregressive baselines while adding minimal computational overhead. This is joint work with Ruoxi Jiang, Xiao Zhang, Karan Jakhar, Peter Y. Lu, Pedram Hassanzadeh, and Michael Maire.

Jakob Zech (Heidelberg University)

Statistical Learning Theory for Neural Operators

Abstract