COMPUTING ALL OPTIMAL PARTIAL TRANSPORTS

Abstract

We consider the classical version of the optimal partial transport problem. Let µ (with a mass of U ) and ν (with a mass of S) be two discrete mass distributions with S ≤ U and let n be the total number of points in the supports of µ and ν. For a parameter α ∈ [0, S], consider the minimum-cost transport plan σ α that transports a mass of α from ν to µ. An OT-profile captures the behavior of the cost of σ α as α varies from 0 to S. There is only limited work on OT-profile and its mathematical properties (see Figalli ( 2010)). In this paper, we present a novel framework to analyze the properties of the OT-profile and also present an algorithm to compute it. When µ and ν are discrete mass distributions, we show that the OT-profile is a piecewise-linear non-decreasing convex function. Let K be the combinatorial complexity of this function, i.e., the number of line segments required to represent the OT-profile. Our exact algorithm computes the OT-profile in Õ(n 2 K) time. Given δ > 0, we also show that the algorithm by Lahn et al. ( 2019) can be used to δ-approximate the OT-profile in O(n 2 /δ + n/δ 2 ) time. This approximation is a piecewise-linear function of a combinatorial complexity of O(1/δ). An OT-profile is arguably more valuable than the OT-cost itself and can be used within applications. Under a reasonable assumption of outliers, we also show that the first derivative of the OT-profile sees a noticeable rise before any of the mass from outliers is transported. By using this property, we get an improved prediction accuracy for an outlier detection experiment. We also use this property to predict labels and estimate the class priors within PU-Learning experiments. Both these experiments are conducted on real datasets.

1. INTRODUCTION

Given two discrete probability distributions µ (with a mass of U = 1) with the set A as the support and ν (with a mass of S = 1) with B as the support, where |A| + |B| = n, in the optimal transport problem, one wishes to compute the minimum cost plan to transport mass from ν to µ. When the mass U ̸ = S, the problem is called the unbalanced optimal transport. In the partial optimal transport problem, given a parameter α ∈ [0, S], one wishes to determine the α-optimal partial transport cost which is the minimum work required to transport a mass of α from ν to µ. Owing to its strong statistical properties, the optimal transport cost (Villani ( 2003 The exact optimal transport cost and plan, including in the unbalanced case, can be computed in O(n 3 log n) time. For a fixed value of α ∈ [0, S], one can easily reduce the problem of computing an α-optimal partial transport cost to solving an unbalanced instance of the optimal transport problem in O(n 3 log n) time: Create a catchment node r in the support of µ with a additional mass of (S -α). Let A ′ = A ∪ {r} be the new support of µ. Note that the total mass of A ′ is U + S -α. For every b ∈ B, add an edge (r, b) with a cost of 0. To find the α-optimal partial transport, one can simply solve the unbalanced optimal transport between A ′ and B using an exact solver in O(n 3 log n) time. Several algorithms approximate the optimal transport cost in O(n 2 poly{1/δ, log n}) time. Cuturi (2013) introduced the Sinkhorn algorithm to solve the entropic regularized optimal transport problem and showed that it can be used to approximate the optimal transport cost within an additive factor of δ in Õ(n 2 /δ 2 ) time; see also Abid & Gower (2018) In the α-optimal partial transport problem, we are interested in finding a minimum-cost α-partial transport plan which we denote by σ * α . We define the OT-profile to be a function ω : [0, S] → R ≥0 that maps a value α ∈ [0, S] to the cost of w(σ * α ) as α goes from 0 to S. For discrete distributions, we show that this function is convex and piecewise-linear. Let K denote the combinatorial complexity of ω. Since, OT profile is a piecewise-linear convex function, its first derivative is a non-decreasing step-function. We denote this step function as Dω where Dω(α) denotes the first derivative of the OT-profile at α. Next, we define the notion of a δ-approximate OT-profile. We say that a function ω : [0, S] → R ≥0 δ-approximates the OT-profile, if, for every α ∈ [0, S], ω(α) ≤ ω(α) ≤ ω(α) + Sδ. Recollect that, when µ and ν are probability distributions, U = S = 1 and we get ω(α) ≤ ω(α) ≤ ω(α) + δ. Therefore, ω(α) is an additive approximation of ω(α) and the function ω represents an additive approximations of all optimal partial transports. From a theoretical standpoint, there is limited understanding of the properties of an OT-profile and its first derivative. In the seminal work by Figalli (2010), he considered the OT-profile of the optimal partial transports for the case where the distributions are continuous and the ground distance c(u, v) is ∥u -v∥ 2 . To the best of our knowledge, we are not aware of any other work on OT-profile. Our Contributions: In this paper, we present a new exact and an approximation algorithm to compute the OT-profile. We also provide a novel framework to derive properties of the OT-profile and its first derivative. Using this framework, we show how an OT-profile can be used to identify points from the outlier class and the inlier class within the support of a distribution. All our results apply for any arbitrary cost function c(•, •). • First, we present a simple primal-dual based combinatorial algorithm to compute the exact optimal transport cost. Our algorithm is a generalization of the well-known Hungarian method for the



We consider the generalized case for arbitrary and potentially unbalanced case. For the case where µ and ν are probability distributions U = 1 and S = 1



); Peyré & Cuturi (2019)) is considered to be an attractive dissimilarity metric between probability distributions, and has found numerous applications in areas involving GANs, image processing, (Arjovsky et al. (2017); Liu et al. (2018); Balaji et al. (2020); Lin et al. (2021); Schmitz et al. (2018); Chen et al. (2019)), variational inference (Ambrogioni et al. (2018)), econometrics (Galichon (2016)) and other areas of natural science (Schiebinger et al. (2019); Sun et al. (2020)) and applied mathematics (Santambrogio (2015)). Similarly the unbalanced and partial optimal transport has been used for various problems that arise in machine learning, including GAN training, image processing, outlier detection and Positive Unlabelled (PU-) learning (Yang & Uhler (2018); Bonneel & Coeurjolly (2019); Chapel et al. (2020); Mukherjee et al. (2021)).

; Altschuler et al. (2017); Dvurechensky et al. (2018); Lin et al. (2019); Guo et al. (2020); Xie et al. (2022). Since then, there has been significant research on the design of additive approximation algorithms and several algorithms achieve an execution time of Õ(n 2 /δ) (Lahn et al. (2019); Jambulapati et al. (2019); Quanrud (2019)). The stateof-the-art execution time for approximating the Optimal Transport is achieved by the combinatorial algorithm by Lahn et al. (2019) (LMR-algorithm). Their algorithm is based on adapting a classical graph theory algorithm Gabow & Tarjan (1989) and runs in O(n 2 /δ + n/δ 2 ) time. In this paper, we study the classical version of the optimal partial transport problem which we introduce next. We are given µ and ν whose supports are the point sets A and B, respectively. Let G(A ∪ B, A × B) be a complete bipartite graph with A ∪ B as the vertex set and A × B as its set of edges. For any a ∈ A (resp. b ∈ B), we associate a mass of µ a (resp. ν b ) such that the U = a∈A µ a and S = b∈B ν b ≤ U 1 . We refer to each point a ∈ A (resp. b ∈ B) to be a demand (resp. supply) point and assume µ a (resp. ν b ) to be a positive rational number. For any pair of points a ∈ A and b ∈ B, we are given a non-negative cost c(a, b) ∈ R ≥0 bounded by 1. The cost of transporting a supply of mass β from b to a is βc(a, b). A transport plan is a function σ : A × B → R ≥0 that assigns a non-negative value to each edge of G indicating the quantity of supply transported along the edge. The transport plan σ is such that the total supplies transported into (resp. from) any demand (resp. supply) node a ∈ A (resp. b ∈ B) is bounded by the demand µ a (resp. supply ν b ) at a (resp. b), i.e., b∈B σ(a, b) = µ a (resp. a∈A σ(a, b) = ν b ). For any α ∈ [0, S], we say that any transport plan σ is an α-partial transport plan if it transports a mass of α from ν to µ, i.e., (a,b)∈A×B σ(a, b) = α. The cost of the partial transport plan denoted by w(σ) is given by (a,b)∈A×B σ(a, b)c(a, b).

