Résumés Statlab 2024

Debbie Dupuis

Mixed-frequency Extreme Value Regression: Estimating the Effect of Mesoscale Convective Systems on Extreme Rainfall Intensity

Résumés

Understanding and modeling the determinants of extreme hourly rainfall intensity is of utmost importance for the management of flash-flood risk. Increasing evidence shows that mesoscale convective systems (MCS) are the principal driver of extreme rainfall intensity in the United States. We use extreme value statistics to investigate the relationship between MCS activity and extreme hourly rainfall intensity in Greater St. Louis, an area particularly vulnerable to flash floods. Using a block maxima approach with monthly blocks, we find that the impact of MCS activity on monthly maxima is not homogeneous within the month/block. To appropriately capture this relationship, we develop a mixed-frequency extreme value regression framework accommodating a covariate sampled at a frequency higher than that of the extreme observation. This is joint work with Luca Trapin (University of Bologna).

Janie Coulombe

Multiply robust estimation of the average treatment effect under confounding and covariate-driven observation times

Résumés

Randomized controlled trials are widely regarded as the gold standard for drawing causal inferences about the effect of a treatment on an outcome. When it is not possible to conduct such an experiment, researchers often resort to observational data, which are not meant for research purposes and present with different features that can affect the causal inference. In this talk, I focus on the challenges of confounding and covariate-driven monitoring times. These features can introduce spurious associations between the treatment and the outcome of interest, thereby distorting standard causal estimators if not properly accounted for. I consider a setting with longitudinal data from electronic health records. Using semiparametric theory, a novel efficient estimator that accounts for informative monitoring times and confounding is proposed for the causal effect of treatment. In addition to being less variable than the only proposed alternative estimator for similar settings, the novel estimator is multiply robust to misspecification of various nuisance models involved in its formulation. It is demonstrated theoretically and in extensive simulation studies. The proposed estimator is further applied to data from the Add Health study in the US to study the causal effect of psychotherapy on alcohol consumption. This is joint work with Professor Shu Yang at North Carolina State University.

Arthur Charpentier (UQAM)

Algorithmic fairness with optimal transport: quantifying counterfactual fairness and mitigating group fairness

Résumés

In this talk, we present two complementary approaches to addressing fairness in algorithmic decision-making, regarding individual and group fairness. First, we use Wasserstein barycenters to obtain (strong Demographic Parity) with one or multiple sensitive features. Our method provides a closed-form solution for the optimal, sequentially fair predictor, enabling possible interpretation of correlations between sensitive attributes. Then, we introduce a novel method that links two existing counterfactual approaches: causal graph-based adaptations (Plečko and Meinshausen, 2020) and optimal transport (De Lara et al., 2024). By extending “Knothe’s rearrangement” (Bonnotte, 2013) and “triangular transport” (Zech and Marzouk, 2022) to probabilistic graphical models, we propose a new group framework, termed sequential transport, which we apply to the problem of individual fairness. Theoretical foundations are established, followed by numerical demonstrations on synthetic and real datasets.

Shirin Golchi

An adaptive enrichment design using Bayesian model averaging for selection and threshold-identification of tailoring variables
Authors: Lara Maleyeff, Shirin Golchi, Erica Moodie

Résumés

Precision medicine is transforming healthcare by offering tailored treatments that enhance patient out- comes and reduce costs. As our understanding of complex diseases improves, clinical trials increasingly aim to detect subgroups of patients with enhanced treatment effects. Biomarker-driven adaptive enrichment designs, which initially enroll a broad population and later restrict to treatment-sensitive patients, are gaining popularity. However, current practice often assumes either pre-trial knowledge of biomarkers or a simple, linear relationship between continuous markers and treatment effectiveness. Motivated by a trial studying rheumatoid arthritis treatment, we propose a Bayesian adaptive enrichment design to identify predictive variables from a larger set of candidate biomarkers. Our approach uses a flexible modeling framework where the effects of continuous biomarkers are represented using free knot B-splines. We then estimate key parameters by marginalizing over all possible variable combinations using Bayesian model averaging. At interim analyses, we assess whether a biomarker-defined subgroup has enhanced or reduced treatment effects, allowing for early termination for efficacy or futility and restricting future enrollment to treatment-sensitive patients. We consider both pre-categorized and continuous biomarkers, the latter potentially having complex, nonlinear relationships to the outcome and treatment effect. Through simulations, we derive the operating characteristics of our design and compare its performance to existing methods.

Félix Camirand

Sélection dynamique par blocs de paramètres de lissage dans l’estimation nonparamétrique d’une densité nonstationnaire en présence de flux de données

Résumés

Nous étudions le problème de la sélection de paramètres de lissage dans le cadre de l’estimation non paramétrique d’une densité à partir d’un flux de données, telles que celles provenant de réseaux de capteurs. Ce type de données se distingue par la collecte continue et à haute résolution de données sur de longues périodes, souvent dans des environnements non stationnaires, ce qui exige des méthodes de traitement à faible stockage, en quasi-temps réel. Les estimateurs non paramétriques adaptés à ces données non stationnaires nécessitent fréquemment un accès répété aux données passées, ce qui les rend peu pratiques dans un contexte de flux de données. La littérature propose également des estimateurs itératifs, mis à jour à chaque nouvelle observation ; toutefois, leur calcul exige souvent la sélection de paramètres de lissage à chaque nouvelle entrée, ce qui limite leur applicabilité. Lorsque les valeurs optimales de ces paramètres évoluent lentement au fil du temps, il devient possible d’utiliser les mêmes paramètres pour des blocs d’observations consécutives. Dans ce travail, en nous appuyant sur la théorie asymptotique, nous développons une stratégie dynamique de validation croisée pour la sélection des paramètres de lissage par blocs, que nous appliquons à deux estimateurs de densité.