Laurent Charlin (HEC Montréal)

Machine learning for MIP solvers: past work and future ideas

Résumé

A popular approach for integrating machine learning into MIP solvers is to train fast predictive models that emulate costly heuristics through imitation learning. I will discuss examples from our work on variable and cutting plan selection. Such techniques aim to obtain the best of both worlds: approximate and powerful estimation techniques (e.g., graph neural networks) with exact combinatorial solvers. While this paradigm has demonstrated value, I will also argue that it is limited and, perhaps, even unlikely to lead to very significant gains. With the advent of larger-scale pre-trained models (in particular, large language models, LLMs), I will explore an alternative paradigm and discuss two ideas for using LLMs to benefit combinatorial solvers, drawing upon examples from other fields. While this last part is speculative, I hope it will stimulate discussion with the participants.

Bistra Dilkina (University of Southern California)

ML-guided solvers for MIP

Résumé

We will demonstrate recent advances in successfully integrating machine learning and combinatorial optimization to improve the speed and solution quality of solvers. Many real-world applications that require combinatorial optimization involve solving repeatedly similar instances of the same problem over different data (objective and constraint coefficients). Hence, they provide the opportunity to learn to search more effectively by leveraging historical instances. We design approaches to effectively augment established state-of-the-art Mixed Integer Linear programming (MILP) solvers with ML-guided components to significantly improve performance. In particular, many important (heuristic) tasks in MILP solving involve choosing subsets of variables, and we demonstrate that Contrastive Loss is particularly well-suited for training in this setting by learning from both positive and negative examples of candidate sets. We show the successful application of contrastive loss in the context of Large Neighborhood Search for MILP, as well as Backdoor selection for MILP, resulting in significant speed-ups over multiple domains. We introduce d-MIPLIB (Distributional MIPLIB) as a unified benchmark set for the research community in ML-guided MILP solving in order to facilitate benchmarking across a variety of problem domains and drive progress in this field. We demonstrate the potential of learning across multiple domains when historical examples are limited, and the impressive power of multi-task learning across different ML-guided solving techniques for any given problem domain, setting a key stepping to a foundation model for ML-guided MILP solving.

Emma Frejinger (Université de Montréal)

A model-free approach for solving choice-based competitive facility location problems using simulation and submodularity

Résumé

Decision makers are often faced with problems that are subject to uncertainty. Consider the problem of planning transport services for an upcoming season, determining optimal locations of new infrastructure, or establishing production plans and pricing strategies for a product. In this context, demand uncertainty is challenging to deal with, notably because it is decision-dependent. Focusing on the competitive facility location problem, we describe a methodology to deal with decision-dependent demand uncertainty without making strong distributional assumptions. We develop a partial Benders reformulation in which the contribution to the objective of the least influential preference profiles is aggregated and bounded by submodular cuts. This set of profiles is selected by a knee detection method that seeks to identify the best tradeoff between the fraction of the demand that is retained in the master problem and the size of the model.

Tias Guns (KU Leuven)

An AI approach to underspecified combinatorial optimisation problems

Résumé

Combinatorial optimisation is used to solve essential problems in industry and society. It excels on well-defined problems, where highly efficient constraint solvers and algorithms can be used to find optimal or near-optimal solutions.
But what if the problem is less well-defined? From an AI perspective, we hope to build intelligent systems that can handle such situations by levering relevant data or by directly interacting with a user.
I will provide examples and recent developments in neuro-symbolic AI where vision and constraint solving is combined; in decision-focussed learning where regression/forecasting models are built by backpropagating over (non-differentiable) solvers; and in Large Language Model based modeling support tools.
The research is part of my ambitious 5 year ERC Consolidator grant ‘Conversational Human-Aware Technology for Optimisation’ (CHAT-Opt).
https://people.cs.kuleuven.be/~tias.guns/

Andrea Lodi (Cornell Tech)

The differentiable Feasibility Pump

Résumé

Although nearly 20 years have passed since its conception, the feasibility pump algorithm remains a widely used heuristic to find feasible primal solutions to mixed-integer linear problems. Many extensions of the initial algorithm have been proposed. Yet, its core algorithm remains centered around two key steps: solving the linear relaxation of the original problem to obtain a solution that respects the constraints, and rounding it to obtain an integer solution. This paper shows that the traditional feasibility pump and many of its follow-ups can be seen as gradient-descent algorithms with specific parameters. A central aspect of this reinterpretation is observing that the traditional algorithm differentiates the solution of the linear relaxation with respect to its cost. This reinterpretation opens many opportunities for improving the performance of the original algorithm. We study how to modify the gradient-update step as well as extending its loss function. We perform extensive experiments on MIPLIB instances and show that these modifications can substantially reduce the number of iterations needed to find a solution. (Joint work with M. Cacciola, A. Forel, A. Frangioni).

Haihao Lu (MIT)

GPU-Accelerated linear programming and beyond

Résumé

In this talk, I will talk about the recent trend of research on new first-order methods for scaling up and speeding up linear programming (LP) and convex quadratic programming (QP) with GPUs. The state-of-the-art solvers for LP and QP are mature and reliable at delivering accurate solutions. However, these methods are not suitable for modern computational resources, particularly GPUs. The computational bottleneck of these methods is matrix factorization, which usually requires significant memory usage and is highly challenging to take advantage of the massive parallelization of GPUs. In contrast, first-order methods (FOMs) only require matrix-vector multiplications, which work well on GPUs and have already accelerated machine learning training during the last 15 years. This ongoing line of research aims to scale up and speed up LP and QP by using FOMs and GPUs. I’ll discuss how we can achieve this by explaining: (i) the behaviors of FOMs for LP; (ii) computational results on GPUs; (iii) theoretical results, including complexity theory and condition number theory, and how theory can lead to better computation and better understanding of the algorithm’s performance. If time permits, I’ll also talk about how to extend it to QP.

Joe Paat (University of British Columbia)

Counting columns in a Delta-modular matrix

Résumé

Totally unimodular (TU) matrices are commonly used to model problems such as maximum matchings in a bipartite graph and maximum flows in a network. One generalization of a TU matrix is a ‘Delta-modular’ matrix, which has subdeterminants bounded by some number Delta. A variety of new problems can be modeled in this Delta-modular IP framework. In this talk, we dive into the structure of Delta-modular matrices, in particular, we count the number of distinct columns that a rank-r Delta-modular matrix can have. We demonstrate how this ‘column number’ can be used as a proxy for the dimension of a Delta-modular IP, and we discuss implications. This bulk of talk is based on joint work with Zach Walsh and Luze Xu.

Dolores Romero Morales (Copenhagen Business School)

On collective local interpretable model-agnostic explanations

Résumé

tbc

Louis-Martin Rousseau (Polytechnique Montéal)

Optimize then predict: an imitation-based learning framework

Résumé
Predictive models are playing an increasingly pivotal role in optimizing decision-
making. This talk introduces an approach wherein a series of (possibly) stochastic sequent ialdecision problems are initially solved offline, and subsequently, a predictive model is constructed based on the offline solutions to facilitate real-time decision-making. The applicability of this methodology is demonstrated through two use cases: patient radiation therapy, where managers need to preserve capacity for emergency cases, and a novel dynamic employee call-timing problem for the scheduling of casual personnel for on-call work shifts.

Thiago Serra (University of Iowa)

Optimization over trained (and Sparse) neural networks: A surrogate within a surrogate

Résumé
We can approximate a constraint or an objective function that is uncertain or nonlinear
with a neural network that we embed in the optimization model. This approach, which is known as constraint learning, faces the challenge that optimization models with neural network surrogates are harder to solve. Such difficulties have motivated studies on model reformulation; specialized methods for generating activation bounds and inferences, cutting planes, and heuristic solutions; and, more recently, structured pruning of the embedded networks. In this work, we focus instead on applying unstructured pruning to constraint learning models: we study the removal of connections instead of neurons for producing sparser embedded models, since sparse models are easier to solve. With our approach,
we obtained faster adversarial perturbations for dense neural networks by using their sparse surrogates, especially (and surprisingly) if we do not take time finetuning the sparse network to make up for the loss in accuracy. In other words, we show that a model with bad classification performance can still be a good optimization surrogate.

Bartolomeo Stellato (Princeton University)

Exact verification of first-order methods via mixed-integer linear programming

Résumé

We present exact mixed-integer linear programming formulations for verifying the performance of first-order methods for parametric quadratic optimization. We formulate the verification problem as a mixed-integer linear program where the objective is to maximize the infinity norm of the fixed-point residual after a given number of iterations. Our approach captures a wide range of gradient, projection, proximal iterations through affine or piecewise affine constraints. We derive tight polyhedral convex hull formulations of the constraints representing the algorithm iterations. To improve the scalability, we develop a custom bound tightening technique combining interval propagation, operator theory, and optimization-based bound tightening. Numerical examples, including linear and quadratic programs from network optimization, sparse coding, and optimal control, show that our method provides several orders of magnitude reductions in the worst-case fixed-point residuals, closely matching the true worst-case performance.

Kevin Tierney (Bielefeld University)

Neural deconstruction search for vehicle routing problems

Résumé

Yassine Yaakoubi (Concordia University)

tbc

Résumé

tbc

Neil Yorke-Smith (Delft University of Technology, The Netherlands)

Towards robust decision-focussed learning

Résumé

Optimisation models used to make discrete decisions often contain uncertain parameters that are context-dependent and estimated through prediction. To account for the quality of the decision made based on the prediction, decision-focussed learning (end-to-end predict-then-optimise) aims at training the predictive model to minimise regret, i.e., the loss incurred by making a suboptimal decision.

First, we recognise that empirical regret can be an ineffective surrogate, because empirical optimal decisions can vary substantially from expected optimal decisions. We analyse the effect of aleatoric and epistemic uncertainty on the surrogate. We then propose three novel loss functions that approximate expected regret more robustly. Experimental results show that training two state-of-the-art DFL approaches using robust regret losses improves test–sample empirical regret in general while keeping computational time equivalent relative to the number of training epochs.

Second, we examine the common practice in DFL of predicting a single value for each uncertain parameter — implicitly assuming that there exists a single-scenario problem-approximation that is sufficient to obtain an optimal decision. We derive requirements for problem-approximations and predictive models that are sufficient to obtain optimal decisions when this assumption does not hold. We further present effective approaches for DFL that adhere to these requirements, with very limited compromise on the complexity of the learning task. We conclude by showing experiments on problems with continuous and discrete variables, as well as uncertainty in the objective function and in the constraints.

This is joint work with N. Schutte and K. Postek.