OUTLIER-ROBUST OPTIMAL TRANSPORT

Abstract

Optimal transport (OT) provides a way of measuring distances between distributions that depends on the geometry of the sample space. In light of recent advances in solving the OT problem, OT distances are widely used as loss functions in minimum distance estimation. Despite its prevalence and advantages, however, OT is extremely sensitive to outliers. A single adversarially-picked outlier can increase OT distance arbitrarily. To address this issue, in this work we propose an outlier-robust OT formulation. Our formulation is convex but challenging to scale at a first glance. We proceed by deriving an equivalent formulation based on cost truncation that is easy to incorporate into modern stochastic algorithms for regularized OT. We demonstrate our model applied to mean estimation under the Huber contamination model in simulation as well as outlier detection on real data.

1. INTRODUCTION

Optimal transport is a fundamental problem in applied mathematics. In its original form (Monge, 1781) , the problem entails finding the minimum cost way to transport mass from a prescribed probability distribution µ on X to another prescribed distribution ν on X . Kantorovich (1942) relaxed Monge's formulation of the optimal transport problem to obtain the Kantorovich formulation: OT(µ, ν) min Π∈F (µ,ν) E (X1,X2)∼Π c(X 1 , X 2 ) , (1.1) where F(µ, ν) is the set of couplings between µ and ν (probability distributions on X × X whose marginals are µ and ν) and c is a cost function, where we typically assume c(x, y) ≥ 0 and c(x, x) = 0. Compared to other notions of distance between probability distributions, optimal transport uniquely depends on the geometry of the sample space. Recent advancements in optimization for optimal transport (Cuturi, 2013; Solomon et al., 2015; Genevay et al., 2016; Seguy et al., 2018) Many applications use OT as a loss in an optimization problem of the form: θ ∈ arg min θ∈Θ OT(µ n , ν θ ), (1.2) where {ν θ } θ∈Θ is a collection of parametric models, µ n is the empirical distribution of the samples. Such estimators are called minimum Kantorovich estimators (MKE) (Bassetti et al., 2006) . They are popular alternatives to likelihood-based estimators, especially in generative modeling. For example, when OT(•, •) is the Wasserstein-1 distance and ν θ is a generator parameterized by a neural network with weights θ, equation 1.2 corresponds to the Wasserstein GAN (Arjovsky et al., 2017) . One drawback of optimal transport is its sensitivity to outliers. Because all the mass in µ must be transported to ν, a small fraction of outliers can have an outsized impact on the optimal transport problem. For statistics and machine learning applications in which the data is corrupted or noisy, this is a major issue. For example, the poor performance of Wasserstein GANs in the presence of outliers was noted in the recent works on outlier-robust generative learning with f -divergence GANs (Chao et al., 2018; Wu et al., 2020) . The problem of outlier-robustness in MKE has not been studied, with the exception of two concurrent works (Staerman et al., 2020; Balaji et al., 2020) .

annex

In this paper, we propose a modification of OT to address its sensitivity to outliers. Our formulation can be used as a loss in equation 1.2 so that it is robust to a small fraction of outliers in the data. To keep things simple, we consider the -contamination model (Huber & Ronchetti, 2009) . Let ν θ0 be a member of a parametric model {ν θ : θ ∈ Θ} and letwhere µ is the data-generating distribution, > 0 is the fraction of outliers, and ν is the distribution of the outliers. Although the fraction of outliers is capped at , the value of the outliers is arbitrary, so the outliers may have an arbitrarily large impact on the optimal transport problem. Our goal is to modify the optimal transport problem so that it is more robust to outliers. We have in mind the downstream application of learning θ 0 from (samples from) µ in the -contamination model. Our main contributions are as follows:1. We propose a robust OT formulation that is suitable for statistical estimation in thecontamination model using MKE. 2. We show that our formulation is equivalent to the original OT problem with a clipped transport cost. This connection enables us to leverage the voluminous literature on computational optimal transport to develop efficient algorithm to perform MKE robust to outliers. 3. Our formulation enables a new application of optimal transport: outlier detection in data.

2.1. ROBUST OT FOR MKE

To promote outlier-robustness in MKE, we need to allow the corresponding OT problem to ignore the outliers in the data distribution µ. The -contamination model imposes a cap on the fraction of outliers, so it is not hard to see that µ -ν θ0 TV ≤ , where • TV is the total-variation norm defined as µ TV = 1 2 |µ(dx)|. This suggests we solve a TV-constrained/regularized version of equation 1.2. The constrained versionsuffers from identification issues. In particular, it cannot distinguish between "clean" distributions within TV distance of ν θ0 . This makes it unsuitable as a loss function for statistical estimation, because it cannot lead to a consistent estimator. However, its regularized counterpartwhere λ > 0 is a regularization parameter, does not suffer from this issue. In the rest of this paper, we work with the TV-regularized formulation equation 2.1.The main idea of our formulation is to allow for modifications of µ, while penalizing their magnitude and ensuring that the modified µ is still a probability measure. Below we formulate this intuition in an optimization problem titled ROBOT (ROBust Optimal Transport):Formulation 1: (2.2)

