A HIGHER PRECISION ALGORITHM FOR COMPUTING THE 1-WASSERSTEIN DISTANCE

Abstract

We consider the problem of computing the 1-Wasserstein distance W(µ, ν) between two d-dimensional discrete distributions µ and ν whose support lie within the unit hypercube. There are several algorithms that estimate W(µ, ν) within an additive error of ε. However, when W(µ, ν) is small, the additive error ε dominates, leading to noisy results. Consider any additive approximation algorithm with execution time T (n, ε). We propose an algorithm that runs in O(T (n, ε/d) log n) time and boosts the accuracy of estimating W(µ, ν) from ε to an expected additive error of min{ε, (d log √ d/ε n)W(µ, ν)}. For the special case where every point in the support of µ and ν has a mass of 1/n (also called the Euclidean Bipartite Matching problem), we describe an algorithm to boost the accuracy of any additive approximation algorithm from ε to an expected additive error of min{ε, (d log log n)W(µ, ν)} in O(T (n, ε/d) log log n) time.

1. INTRODUCTION

Given two discrete probability distributions µ and ν whose support A and B, respectively, lie inside the d-dimensional unit hypercube [0, 1] d with max{|A|, |B|} = n, the 1-Wasserstein distance W(µ, ν) (also called the Earth Mover's distance) between them is the minimum cost required to transport mass from ν to µ under the Euclidean metric. The special case where |A| = |B| = n and the mass at each point of A ∪ B is 1/n is called the Euclidean Bipartite Matching (EBM) problem. In machine learning applications, one can improve a model µ by using its Earth Mover's distance from a distribution ν built on real data. Consequently, it has been extensively used in generative models (Deshpande et al. ( 2018 2019)). When W(µ, ν) is significantly smaller than ε, however, the cost produced by such algorithms will be unreliable as it is dominated by the error parameter ε. To get a higher accuracy in this case, for α > 0, one can compute an α-relative approximation of the 1-Wasserstein distance, which is a cost w that satisfies W(µ, ν) ≤ w ≤ αW(µ, ν). There has been considerable effort on designing relative-approximation algorithms; however, many such methods suffer from curse of dimensionality; i.e, their execution time grows exponentially in d. Furthermore, they rely on fairly involved data structures that have good asymptotic execution times but are slow in practice and difficult to implement, making them impractical (Agarwal & Sharathkumar (2014) 



); Genevay et al. (2018); Salimans et al. (2018)), robust learning (Esfahani & Kuhn (2018)), supervised learning (Luise et al. (2018); Janati et al. (2019)), and parameter estimation (Liu et al. (2018); Bernton et al. (2019)). Computing the 1-Wasserstein distance between µ and ν can be modeled as a linear program and solved in O(n 3 log n) time (Edmonds & Karp (1972); Orlin (1988)), which is computationally expensive. There has been substantial effort on designing ε-additive-approximation algorithms that estimate W(µ, ν) within an additive error of ε in n 2 poly(d, log n, 1/ε) time (Cuturi (2013); Lin et al. (2019); Lahn et al. (

; Fox & Lu (2020); Agarwal et al. (2022a;b)). The only exception to this is a classical greedy algorithm, based on a d-dimensional quadtree, that returns an O(d log n)-relative approximation of the 1-Wasserstein distance in O(nd) time. It has been used in various machine-learning and computervision applications (Gupta et al. (2010); Backurs et al. (2020)). In the case of the Euclidean Bipartite

