THE UNBALANCED GROMOV WASSERSTEIN DIS-TANCE: CONIC FORMULATION AND RELAXATION

Abstract

Comparing metric measure spaces (i.e. a metric space endowed with a probability distribution) is at the heart of many machine learning problems. This includes for instance predicting properties of molecules in quantum chemistry or generating graphs with varying connectivity. The most popular distance between such metric measure spaces is the Gromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. This distance has been successfully applied to supervised learning and generative modeling, for applications as diverse as quantum chemistry or natural language processing. The GW distance is however limited to the comparison of metric measure spaces endowed with a probability distribution. This strong limitation is problematic for many applications in ML where there is no a priori natural normalization on the total mass of the data. Furthermore, imposing an exact conservation of mass across spaces is not robust to outliers and often leads to irregular matching. To alleviate these issues, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance and a more tractable upper-bounding relaxation. They both allow the comparison of metric spaces equipped with arbitrary positive measures up to isometries. The first formulation is a positive and definite divergence based on a relaxation of the mass conservation constraint using a novel type of quadratically-homogeneous divergence. This divergence works hand in hand with the entropic regularization approach which is popular to solve large scale optimal transport problems. We show that the underlying non-convex optimization problem can be efficiently tackled using a highly parallelizable and GPU-friendly iterative scheme. The second formulation is a distance between mm-spaces up to isometries based on a conic lifting. Lastly, we provide numerical simulations to highlight the salient features of the unbalanced divergence and its potential applications in ML.

1. INTRODUCTION

Comparing data distributions on different metric spaces is a basic problem in machine learning. This class of problems is for instance at the heart of surfaces (Bronstein et al., 2006) or graph matching (Xu et al., 2019) (equipping the surface or graph with its associated geodesic distance), regression problems in quantum chemistry (Gilmer et al., 2017) (viewing the molecules as distributions of points in R 3 ) and natural language processing (Grave et al., 2019; Alvarez-Melis & Jaakkola, 2018) (where texts in different languages are embedded as points distributions in different vector spaces). Metric measure spaces. The mathematical way to formalize these problems is to model the data as metric measure spaces (mm-spaces). A mm-space is denoted as X = (X, d, µ) where X is a complete separable set endowed with a distance d and a positive Borel measure µ ∈ M + (X). For instance, if X = (x i ) i is a finite set of points, then µ = i m i δ xi (here δ xi is the Dirac mass at x i ) is simply a set of positive weights m i = µ({x i }) ≥ 0 associated to each point x i , which accounts for its mass or importance. For instance, setting some m i to 0 is equivalent to removing the point x i . We refer to Sturm (2012) for a mathematical account on the theory of mm-spaces. In all the applications highlighted above, it makes sense to perform the comparisons up to isometric transformations of the data. Two mm-spaces X = (X, d X , µ) and Y = (Y, d Y , ν) are 1

