A HIGHER PRECISION ALGORITHM FOR COMPUTING THE 1-WASSERSTEIN DISTANCE

Abstract

We consider the problem of computing the 1-Wasserstein distance W(µ, ν) between two d-dimensional discrete distributions µ and ν whose support lie within the unit hypercube. There are several algorithms that estimate W(µ, ν) within an additive error of ε. However, when W(µ, ν) is small, the additive error ε dominates, leading to noisy results. Consider any additive approximation algorithm with execution time T (n, ε). We propose an algorithm that runs in O(T (n, ε/d) log n) time and boosts the accuracy of estimating W(µ, ν) from ε to an expected additive error of min{ε, (d log √ d/ε n)W(µ, ν)}. For the special case where every point in the support of µ and ν has a mass of 1/n (also called the Euclidean Bipartite Matching problem), we describe an algorithm to boost the accuracy of any additive approximation algorithm from ε to an expected additive error of min{ε, (d log log n)W(µ, ν)} in O(T (n, ε/d) log log n) time.

1. INTRODUCTION

Given two discrete probability distributions µ and ν whose support A and B, respectively, lie inside the d-dimensional unit hypercube [0, 1] d with max{|A|, |B|} = n, the 1-Wasserstein distance W(µ, ν) (also called the Earth Mover's distance) between them is the minimum cost required to transport mass from ν to µ under the Euclidean metric. The special case where |A| = |B| = n and the mass at each point of A ∪ B is 1/n is called the Euclidean Bipartite Matching (EBM) problem. In machine learning applications, one can improve a model µ by using its Earth Mover's distance from a distribution ν built on real data. Consequently, it has been extensively used in generative models (Deshpande et al. (2018) Computing the 1-Wasserstein distance between µ and ν can be modeled as a linear program and solved in O(n 3 log n) time (Edmonds & Karp (1972) ; Orlin (1988)), which is computationally expensive. There has been substantial effort on designing ε-additive-approximation algorithms that estimate W(µ, ν) within an additive error of ε in n 2 poly(d, log n, 1/ε) time (Cuturi (2013); Lin et al. ( 2019); Lahn et al. ( 2019)). When W(µ, ν) is significantly smaller than ε, however, the cost produced by such algorithms will be unreliable as it is dominated by the error parameter ε. To get a higher accuracy in this case, for α > 0, one can compute an α-relative approximation of the 1-Wasserstein distance, which is a cost w that satisfies W(µ, ν) ≤ w ≤ αW(µ, ν). There has been considerable effort on designing relative-approximation algorithms; however, many such methods suffer from curse of dimensionality; i.e, their execution time grows exponentially in d. Furthermore, they rely on fairly involved data structures that have good asymptotic execution times but are slow in practice and difficult to implement, making them impractical (Agarwal & Sharathkumar ( 2014 2020)). In the case of the Euclidean Bipartite Matching, Agarwal & Varadarajan (2004) and Indyk (2007) generalized the algorithm in the hierarchical framework to achieve a relative approximation ratio of O(d 2 log(1/ε)) in Õ(n 1+ε ) timefoot_0 . In this paper, we design an algorithm that combines any additive approximation algorithm with the hierarchical quad-tree based framework. As a result, our algorithm achieves better guarantees for both additive and relative approximations. To our knowledge, this is the first result that combines the power of additive and relative approximation techniques, leading to improvement in both settings. For each point a ∈ A (resp. b ∈ B), we assign a weight η(a) = -µ a (resp. η(b) = ν b ). We refer to any point v ∈ A ∪ B with a negative (resp. positive) weight as a demand point (resp. supply point) with a demand (resp. supply) of |η(v)|. Given any subset of points V ⊆ A ∪ B, the weight η(V ) is simply the sum of the weights of its points; i.e, η(V ) = v∈V η(v). For any edge (a, b) ∈ A × B, let the cost of transporting a supply of β from b to a be β∥a -b∥. In this problem, our goal is to transport all supplies from supply points to demand points at the minimum cost. More formally, a transport plan is a function σ : A × B → R ≥0 that assigns a non-negative value to each edge of G(A, B) indicating the quantity of supplies transported along the edge. The transport plan σ is such that the total supplies transported into (resp. from) any demand (resp. supply) point a ∈ A (resp. b ∈ B) is equal to -η(a) (resp. η(b)). The cost of the transport plan σ, denoted by w(σ), is given by (a,b)∈A×B σ(a, b)∥a -b∥. The goal of this problem is to find a minimum-cost transport plan. If two points a ∈ A and b ∈ B are co-located (i.e, they share the same coordinates), then due to the metric property of the Euclidean distances, if η(b) = -η(a), we can match the supplies to the demands at zero cost and remove the points from the input. Otherwise, if η(b) ̸ = -η(a), we replace the two points with a single point of weight η(a) + η(b). By definition, if the weight of the newly created point is negative (resp. positive), we consider it a demand (resp. supply) point. In our presentation, we always consider A and B to be the point sets obtained after replacing all the colocated points. Observe that, after removing the co-located points, the total supply U = η(B) may be less than 1. However, it is easy to see that η(B) = -η(A); i.e, the problem instance defined on A∪B is balanced. We say that a transport plan σ is an ε-close transport plan if w(σ) ≤ W(µ, ν) + εU . In many applications, the distributions µ and ν are continuous or large (possibly unknown) discrete distributions. In such cases, it might be computationally expensive or even impossible to compute W(µ, ν). Instead, one can draw two sets A and B of n samples each from µ and ν, respectively. Each point a ∈ A (resp. b ∈ B) is assigned a weight of η(a) = -1/n (resp. η(b) = 1/n). One can approximate the 1-Wasserstein distance between the distributions µ and ν by simply solving the 1-Wasserstein problem defined on G(A, B). This special case where every point has the same demand and supply is called the Euclidean Bipartite Matching (EBM) problem. A matching M is a set of vertex-disjoint edges in G(A, B) and has a cost 1/n (a,b)∈M ∥a-b∥. For the EBM problem, the optimal transport plan is simply a minimum-cost matching of cardinality n. For any point set P in the Euclidean space, let C max (P ) := max (a,b)∈P ×P ∥a -b∥ denote the distance of its farthest pair and C min (P ) := min (a,b)∈P ×P,a̸ =b ∥a -b∥ denote the distance of its closest pair. The spread of the point set, denoted by ∆(P ), is the ratio ∆(P ) = C max (P )/C min (P ). When P is obvious from the context, we simply use C min , C max , and ∆ to denote the distance of its closest and farthest pair and its spread.

1.2. RELATED WORK

Relative Approximations: In fixed dimensional settings, i.e., d = O(1), there is extensive work on the design of near-linear time Monte-Carlo (1 + ε)-relative approximation al-



Õ() hides poly(d, log n, 1/ε) factors in the execution time.



; Genevay et al. (2018); Salimans et al. (2018)), robust learning (Esfahani & Kuhn (2018)), supervised learning (Luise et al. (2018); Janati et al. (2019)), and parameter estimation (Liu et al. (2018); Bernton et al. (2019)).

); Fox & Lu (2020); Agarwal et al. (2022a;b)). The only exception to this is a classical greedy algorithm, based on a d-dimensional quadtree, that returns an O(d log n)-relative approximation of the 1-Wasserstein distance in O(nd) time. It has been used in various machine-learning and computervision applications (Gupta et al. (2010); Backurs et al. (

PROBLEM DEFINITIONWe are given two discrete distributions µ and ν. Let A and B be the points in the support of µ and ν, respectively. For the distribution µ (resp. ν), suppose each point a ∈ A (resp. b ∈ B) has a probability of µ a (resp. ν b ) associated with it, where a∈A µ a = b∈B ν b = 1. Let G(A, B) denote the complete bipartite graph where, for any pair of points a ∈ A and b ∈ B, there is an edge from a to b of cost ∥a -b∥, i.e, the Euclidean distance between a and b.

