TANGENTIAL WASSERSTEIN PROJECTIONS

Abstract

We develop a notion of projections between sets of probability measures using the geometric properties of the 2-Wasserstein space. In contrast to existing methods, it is designed for multivariate probability measures that need not be regular, is computationally efficient to implement via a linear regression, and provides a unique solution in general. The idea is to work on tangent cones of the Wasserstein space using generalized geodesics. Its structure and computational properties make the method applicable in a variety of settings where probability measures need not be regular, from causal inference to the analysis of object data. An application to estimating causal effects yields a generalization of the synthetic controls method for systems with general heterogeneity described via multivariate probability measures, something that has been out of reach of existing approaches.

1. INTRODUCTION

The concept of projections, that is, approximating a target quantity of interest by an optimally weighted combination of other quantities, is of fundamental relevance in learning theory and statistics. Projections are generally defined between random variables in appropriately defined linear spaces (e.g. van der Vaart, 2000, chapter 11) . In modern statistics and machine learning applications, the objects of interest are often probability measures themselves. Examples range from object-and functional data (e.g. Marron & Alonso, 2014) to causal inference with individual heterogeneity (e.g. Athey & Imbens, 2015) . A notion of projection between sets of probability measures should be applicable between any set of general probability measures, replicate geometric properties of the target measure, and possess good computational and statistical properties. We introduce such a notion of projection between sets of general probability measures supported on Euclidean spaces. It provides a unique solution to the projection problem under mild conditions. To achieve this, we work in the 2-Wasserstein space, that is, the set of all probability measures with finite second moments equipped with the 2-Wasserstein distance. Importantly, we focus on the multivariate setting, i.e. we consider the Wasserstein space over some Euclidean space R d , denoted by W 2 , where the dimension d can be high. The multivariate setting poses challenges from a mathematical, computational, and statistical perspective. In particular, W 2 is a positively curved metric space for d > 1 (e.g. Ambrosio et al., 2008 , Kloeckner, 2010) . Moreover, the 2-Wasserstein distance between two probability measures is defined as the value function of the Monge-Kantorovich optimal transportation problem (Villani, 2003, chapter 2), which does not have a closed-form solution in multivariate settings. This is coupled with a well-known statistical curse of dimensionality for general measures (Ajtai et al., 1984 , Dudley, 1969 , Fournier & Guillin, 2015 , Talagrand, 1992; 1994 , Weed & Bach, 2019) .

1.1. EXISTING APPROACHES

These challenges have impeded the development of a method of projections between general and potentially high-dimensional probability measures. A focus so far has been on the univariate and low-dimensional setting. In particular, Chen et al. ( 2021 2022). The first develops a regression approach in barycentric coordinates with applications in computer graphics as well as color and shape transport problems. Their method is defined directly on W 2 and requires solving a computationally costly bilevel optimization problem, which does not necessarily yield global solutions. The second introduces a linearization of the 2-Wasserstein space by lifting it to a L 2 -space anchored at measure that is absolutely continuous with respect to Lebesgue measure. This approach relies on the existence of optimal transport maps between this absolutely continuous "anchor" distribution and other distributions and hence only defines tangent spaces at absolutely continuous measures. The third works on a tangential structure based on "Karcher means" (Karcher, 2014 , Zemel & Panaretos, 2019) , which is more restrictive still. This implies that their method requires all involved measures to be absolutely continuous measures with densities that are bounded away from zero, with the target measure lying in the convex hull of the control measures.

1.2. OUR CONTRIBUTION

In contrast to the existing approaches, our method is applicable for general probability measures, allows for the target measure to be outside the generalized geodesic convex hull of the control measures, can be implemented by a standard constrained linear regression, and provides a global-and in many cases unique-solution. The proposed method transforms the projection problem on the positively curved Wasserstein space into a linear optimization problem in the geometric tangent cone, which can be implemented via a linear regression. This problem takes the form of a deformable template (Boissard et al., 2015 , Yuille, 1991) , which connects our approach to this literature. Our method can be implemented in three steps: (i) obtain the general tangent cone structure at the target measure, (ii) construct a tangent space from the tangent cone via barycentric projections if it does not exist, and (iii) perform a linear regression to carry out the projection in the tangent space. This implementation of the projection approach via linear regression is computationally efficient, in particular compared to the existing methods in Bonneel et al. (2016) and Werenski et al. (2022) . The challenging part of the implementation is lifting the problem to the tangential structure: this requires computing the corresponding optimal transport plans between the target and each measure used in the projection. Many methods have been developed for this, see for instance Benamou & Brenier (2000) , Jacobs & Léger ( 2020 2020) and references therein. Other alternatives compute approximations of the optimal transport plans via regularized optimal transport problems (Peyré & Cuturi, 2019), such as entropy regularized optimal transport (Galichon & Salanié, 2010 , Cuturi, 2013) . The proposed projection approach is compatible with any such method, therefore its complexity scales with that of estimating optimal transport plans. We provide results for the statistical consistency when estimating the measures via their empirical counterparts in practice. To demonstrate the efficiency and utility of the proposed method, we apply our method in different settings and compare it to existing benchmarks such as Werenski et al. (2022) . Furthermore, we extend the classical synthetic control estimator (Abadie & Gardeazabal, 2003 , Abadie et al., 2010) to settings with observed individual heterogeneity in multivariate outcomes. The synthetic controls estimator is a projection approach, where one tries to predict an aggregate outcome of a treated unit by an optimal convex combination of control units and to use the weights of this optimal combination to construct the counterfactual state of the treated unit had it not received treatment. The novelty of our application is that it lets us perform the synthetic control method on the joint distribution of several outcomes, which complements the recently introduced method in Gunsilius (2022) designed for univariate outcomes. The possibility to project entire probability measures allows us to disentangle treatment heterogeneity at the treatment unit level. The possibility of working with general probability measures is key in this setting, as many outcomes of interest are not regular. We illustrate this by applying our method to estimate the effects of a Medicaid expansion policy in Montana, where we consider-as outcome-non-regular probability measure in d = 28 dimensions.



), Ghodrati & Panaretos (2022), and Pegoraro & Beraha (2021) introduced frameworks for distribution-on-distribution regressions in the univariate setting for object data. Bigot et al. (2014), Cazelles et al. (2017) developed principal component analyses on the space of univariate probability measures using geodesics on the Wasserstein space. The most closely related works to ours are Bonneel et al. (2016), Mérigot et al. (2020), and Werenski et al. (

), Makkuva et al. (2020), Peyré & Cuturi (2019), Ruthotto et al. (

