Maria Sofia Bucarelli (Sapienza University of Rome)

Full‑Rank Embeddings in Random Perceptrons

Abstract
Deep neural networks exhibit improved training and generalization performance as their width grows well beyond the size of the training set—contradicting classical overfitting intuition. To explain this “benign overparameterization,” we analyze the representational capacity of a random one‑hidden‑layer perceptron with Gaussian weights (no bias). We ask: when does a hidden layer of dimension n map k input vectors, each pair separated by at least angle θ, to a full‑rank activation matrix—ensuring that a simple linear classifier can perfectly fit those inputs? Framing this as a question about hyperplane arrangements on the unit sphere, we derive nontrivial lower bounds on the probability that a random projection avoids the arrangement’s zero‑measure regions. Our results show that once the hidden dimension exceeds a threshold growing subquadratically in k (and depending only weakly on θ and input dimension), the hidden representations are linearly independent with high probability.

Giuseppe Alessio D’Inverno (International School for Advanced Studies, Trieste, Italy)

Sparse polynomial approximation of diffusion on community-structured graphs

Abstract

Diffusion kernels over graphs have been widely utilized as effective tools in various applications due to their ability to accurately model the flow of information through nodes and edges. However, there is a notable gap in the literature regarding the development of surrogate models for diffusion processes on graphs. In this work, we fill this gap by proposing sparse polynomial-based surrogate models for parametric diffusion equations on graphs with community structure. In tandem, we provide convergence guarantees for both least squares and compressed sensing-based approximations by showing the holomorphic regularity of parametric solutions to these diffusion equations. Our theoretical findings are accompanied by a series of numerical experiments conducted on both synthetic and real-world graphs that demonstrate the applicability of our methodology.

Daniel Fassler (Concordia University)

Finite-Data Error Bounds for Approximating the Koopman Operator

Abstract

First introduced in the 1930s, the Koopman operator framework presents an equivalent linear formulation of nonlinear systems by lifting the dynamics to an infinite-dimensional Banach space of scalar functions called observables. In the past decade the method of extended dynamic mode decomposition has emerged as a powerful computational technique to approximating the action of the Koopman operator on a finite span of functions, a dictionary, directly from data gathered from the system. The goal of this work is to further elucidate the relationship between the dictionary and the gathered data to provide explicit error bounds for the performance of the method. In particular, we show provide both analytical and empirical evidence that least squares regression techniques lead to a Monte Carlo rate of convergence in the amount of data. We can improve upon this in two ways. First, we show that growing the dictionary with the data can lead to exponentially vanishing error bounds, and second, we show that sparse regression techniques such as LASSO and basis pursuit can lead to improved accuracy and computational savings in the high-dimensional setting, where the dynamics tend to be sparser.

Alessandro Felisi (University of Genoa)

Compressed sensing for Photoacoustic Tomography on the sphere

Abstract

Photoacoustic Tomography (PAT) is an emerging medical imaging technology whose primary aim is to map the high-contrast optical properties of biological tissues by leveraging high-resolution ultrasound measurements. Mathematically, this can be framed as an inverse source problem for the wave equation over a specific domain. In this work, it is shown how, by assuming signal sparsity, it is possible to establish rigorous stable recovery guarantees when the domain is spherical and the data collection is given by spatial averages restricted to a limited portion of the boundary. The result is a consequence of a general compressed sensing framework for inverse problems developed with co-authors and stability estimates for the inverse wave problem.

Sina Mohammadtaheri (Concordia University)

Deep greedy unfolding: Sorting out the argsort in greedy sparse recovery algorithms

Abstract

Unrolled neural networks have gained attention for signal processing by mimicking iterative algorithms and offering recovery guarantees. However, there’s been limited progress in unrolling greedy and thresholding-based sparse recovery algorithms like Orthogonal Matching Pursuit (OMP) and Iterative Hard Thresholding (IHT) due to the non-differentiable argsort operator, which hinders gradient-based training. To overcome this, we use a continuous relaxation called “softsort” to approximate argsort. We introduce differentiable versions—Soft-OMP and Soft-IHT—and show they reliably approximate the original algorithms with minimal error, given appropriate softsort temperature and element gaps. When integrated into neural networks with trainable weights, these unrolled algorithms effectively capture hidden data structures, linking our method to weighted sparse recovery.

Rahul Padmanabhan (Concordia University)

Deep Learning Approximation of Matrix Functions: From Feedforward Neural Networks to Transformers

Abstract

Deep Neural Networks (DNNs) have been at the forefront of Artificial Intelligence (AI) over the last decade. Transformers, a type of DNN, have revolutionized Natural Language Processing (NLP) through models like ChatGPT, Llama and more recently, Deepseek. While transformers are used mostly in NLP tasks, their potential for advanced numerical transformers are used mostly in NLP tasks, their potential for advanced numerical computations remains largely unexplored. This presents opportunities in areas like surrogate modeling and raises fundamental questions about AI’s mathematical capabilities. We investigate the use of transformers for approximating matrix functions, which are mappings that extend scalar functions to matrices. These functions are ubiquitous in scientific applications, from continuous-time Markov chains (matrix exponential) to stability analysis of dynamical systems (matrix sign function). Our work makes two main contributions. First, we prove theoretical bounds on the depth and width requirements for ReLU DNNs to approximate the matrix exponential. Second, we use transformers with encoded matrix data to approximate general matrix functions and compare their performance to feedforward DNNs. Through extensive numerical experiments, we demonstrate that the choice of matrix encoding scheme significantly impacts transformer performance. Our results show strong accuracy in approximating the matrix sign function, suggesting transformers’ potential for advanced mathematical computations.

Kateryna Pozharska (Institute of Mathematics of the National Academy of Sciences of Ukraine / TU Chemnitz, Germany)

Recent developments in function recovery: theoretical guarantees, comparison and optimality

Abstract

The poster presents our contribution to the problem of optimal function recovery and an overview of existing results. Namely, we provide theoretical guarantees, make a comparison and discuss the question of optimality of algorithms using different types of information (function values vs. linear measurements).

Siddharth Rout (University of British Columbia)

Uncertainity Prediction and Generation of High Dimensional Non-Gaussian States using Stochastic Interpolation for Dynamical Systems

Abstract

Learning dynamical systems is critical in numerous domains but is often constrained by incomplete or noisy data. This study introduces the use of stochastic interpolation (SI) for probabilistic forecasting that estimates future states as probability distributions rather than deterministic single-point predictions. By modeling uncertainty explicitly, SI provides a robust framework for uncertainty-aware forecasting, addressing challenges inherent to real-world dynamical systems. Unlike traditional deterministic methods, SI leverages an autoregressive flow map to learn the transformation between a distribution of initial states and a distribution of final states. This approach captures the inherent variability and unpredictability of dynamical systems, making it particularly well-suited for complex and open systems. SI is also used to generate samples for ensemble predictions from perturbed high dimensional non-Gaussian states. Illustrations on predator-prey dynamics, real chaotic temperature dynamics and spatiotemporal weather trends strengthen the proposal strongly.

Xuemeng Wang (Simon Fraser University)

Christoffel Adaptive Sampling in Sparse Random Feature Models for Function Approximation

Abstract

Random feature models have become a powerful tool for approximating high-dimensional functions and solving partial differential equations (PDEs) efficiently. The sparse random feature expansion (SRFE) enhances traditional random feature methods by incorporating sparsity and compressive sensing principles, making it particularly effective in data-scarce settings.
In this work, we integrate active learning with sparse random feature approximations to improve sampling efficiency. Specifically, we incorporate the Christoffel function to guide an adaptive sampling process, dynamically selecting informative sample points based on their contribution to the function space. This approach optimizes the distribution of sample points by leveraging the Christoffel function associated with a chosen function basis.
We conduct numerical experiments on comparing adaptive and non-adaptive sampling strategies within the sparse random feature framework and examine their implications for function approximation. Furthermore, we implement different sparse recovery solvers, including Orthogonal Matching Pursuit (OMP) and Hard Thresholding Pursuit (HTP) to reconstruct target function. Our results demonstrate the advantages of adaptive sampling in maintaining high accuracy while reducing sample complexity, highlighting its potential for scientific computing.
This is joint work with Ben Adcock and Khiem Can.

Andrew Warren (University of British Columbia) *TBC

Estimation of one-dimensional structures from noisy empirical observation

Abstract

Given a data distribution which is concentrated around a one-dimensional structure, can we infer that structure? We consider versions of this problem where the distribution resides in a metric space and the 1d structure is assumed to either be the range of an absolutely continuous curve, a connected set of finite 1d Hausdorff measure, or a general 1-rectifiable set. In each of these cases, we relate the inference task to solving a variational problem where there is a tradeoff between data fidelity and simplicity of the inferred structure; the variational problems we consider are closely related to the so-called “principal curve” problem of Hastie and Steutzle as well as the “average-distance problem” of Buttazzo, Oudet, and Stepanov. For each of the variational problems under consideration, we establish existence of minimizers, stability with respect to the data distribution, and consistency of a discretization scheme which is amenable to Lloyd-type numerical methods. Lastly, we consider applications to estimation of stochastic processes from partial observation, as well as the lineage tracing problem from mathematical biology.