SPHERICAL SLICED-WASSERSTEIN

Abstract

Many variants of the Wasserstein distance have been introduced to reduce its original computational burden. In particular the Sliced-Wasserstein distance (SW), which leverages one-dimensional projections for which a closed-form solution of the Wasserstein distance is available, has received a lot of interest. Yet, it is restricted to data living in Euclidean spaces, while the Wasserstein distance has been studied and used recently on manifolds. We focus more specifically on the sphere, for which we define a novel SW discrepancy, which we call spherical Sliced-Wasserstein, making a first step towards defining SW discrepancies on manifolds. Our construction is notably based on closed-form solutions of the Wasserstein distance on the circle, together with a new spherical Radon transform. Along with efficient algorithms and the corresponding implementations, we illustrate its properties in several machine learning use cases where spherical representations of data are at stake: sampling on the sphere, density estimation on real earth data or hyperspherical auto-encoders.

1. INTRODUCTION

Optimal transport (OT) (Villani, 2009) has received a lot of attention in machine learning in the past few years. As it allows to compare distributions with metrics, it has been used for different tasks such as domain adaptation (Courty et al., 2016) or generative models (Arjovsky et al., 2017) , to name a few. The most classical distance used in OT is the Wasserstein distance. However, calculating it can be computationally expensive. Hence, several variants were proposed to alleviate the computational burden, such as the entropic regularization (Cuturi, 2013; Scetbon et al., 2021 ), minibatch OT (Fatras et al., 2020) or the sliced-Wasserstein distance (SW) for distributions supported on Euclidean spaces (Rabin et al., 2011b) . Although embedded in larger dimensional Euclidean spaces, data generally lie in practice on manifolds (Fefferman et al., 2016) . A simple manifold, but with lots of practical applications, is the hypersphere S d-1 . Several types of data are by essence spherical: a good example is found in directional data (Mardia et al., 2000; Pewsey & García-Portugués, 2021) for which dedicated machine learning solutions are being developed (Sra, 2018) , but other applications concern for instance geophysical data (Di Marzio et al., 2014 ), meteorology (Besombes et al., 2021 ), cosmology (Perraudin et al., 2019) or extreme value theory for the estimation of spectral measures (Guillou et al., 2015) . Remarkably, in a more abstract setting, considering hyperspherical latent representations of data is becoming more and more common (e.g. (Liu et al., 2017; Xu & Durrett, 2018; Davidson et al., 2018) ). For example, in the context of variational autoencoders (Kingma & Welling, 2013), using priors on the sphere has been demonstrated to be beneficial (Davidson et al., 2018) . Also, in the context of self-supervised learning (SSL), where one wants to learn discriminative representations in an unsupervised way, the hypersphere is usually considered for the latent representation (Wu et al., 2018; Chen et al., 2020a; Wang & Isola, 2020; Grill et al., 2020; Caron et al., 2020) . It is thus of primary importance to develop machine learning tools that accommodate well with this specific geometry. The OT theory on manifolds is well developed (Villani, 2009; Figalli & Villani, 2011; McCann, 2001) and several works started to use it in practice, with a focus mainly on the approximation of OT maps. 2020) learn the transport map on hyperbolic spaces. However, the computational bottleneck to compute the Wasserstein distance on such spaces remains, and, as underlined in the conclusion of (Nadjahi, 2021), defining SW distances on manifolds would be of much interest. Notably, Rustamov & Majumdar (2020) proposed a variant of SW, based on the spectral decomposition of the Laplace-Beltrami operator, which generalizes to manifolds given the availability of the eigenvalues and eigenfunctions. However, it is not directly related to the original SW on Euclidean spaces. Contributions. Therefore, by leveraging properties of the Wasserstein distance on the circle (Rabin et al., 2011a), we define the first, to the best of our knowledge, natural generalization of the original SW discrepancy on a non trivial manifold, namely the sphere S d-1 , and hence we make a first step towards defining SW distances on Riemannian manifolds. We make connections with a new spherical Radon transform and analyze some of its properties. We discuss the underlying algorithmic procedure, and notably provide an efficient implementation when computing the discrepancy against a uniform distribution. Then, we show that we can use this discrepancy on different tasks such as sampling, density estimation or generative modeling.

2. BACKGROUND

The aim of this paper is to define a Sliced-Wasserstein discrepancy on the hypersphere S d-1 = {x ∈ R d , ∥x∥ 2 = 1}. Therefore, in this section, we introduce the Wasserstein distance on manifolds and the classical SW distance on R d .

2.1. WASSERSTEIN DISTANCE

Since we are interested in defining a SW discrepancy on the sphere, we start by introducing the Wasserstein distance on a Riemannian manifold M endowed with the Riemannian distance d. We refer to (Villani, 2009; Figalli & Villani, 2011) for more details. Let p ≥ 1 and µ, ν ∈ P p (M ) = {µ ∈ P(M ), M d p (x, x 0 ) dµ(x) < ∞ for some x 0 ∈ M }. Then, the p-Wasserstein distance between µ and ν is defined as W p p (µ, ν) = inf γ∈Π(µ,ν) M ×M d p (x, y) dγ(x, y), where Π(µ, ν) = {γ ∈ P(M × M ), ∀A ⊂ M, γ(M × A) = ν(A) and γ(A × M ) = µ(A)} denotes the set of couplings. For discrete probability measures, the Wasserstein distance can be computed using linear programs (Peyré et al., 2019) . However, these algorithms have a O(n 3 log n) complexity w.r.t. the number of samples n which is computationally intensive. Therefore, a whole literature consists of defining alternative discrepancies which are cheaper to compute. On Euclidean spaces, one of them is the Sliced-Wasserstein distance.

2.2. SLICED-WASSERSTEIN DISTANCE

On M = R d with d(x, y) = ∥x -y∥ p p , a more attractive distance is the Sliced-Wasserstein (SW) distance. This distance relies on the appealing fact that for one dimensional measures µ, ν ∈ P(R), we have the following closed-form (Peyré et al., 2019, Remark 2.30 ) W p p (µ, ν) = 1 0 F -1 µ (u) -F -1 ν (u) p du, where F -1 µ (resp. W p p (P θ # µ, P θ # ν) dλ(θ), where P θ (x) = ⟨x, θ⟩, λ is the uniform distribution on S d-1 and for any Borel set A ∈ B(R d ), P θ # µ(A) = µ((P θ ) -1 (A)).



For example, Cohen et al. (2021); Rezende & Racanière (2021) approximate the OT map to define normalizing flows on Riemannian manifolds, Hamfeldt & Turnquist (2021a;b); Cui et al. (2019) derive algorithms to approximate the OT map on the sphere, Alvarez-Melis et al. (2020); Hoyos-Idrobo (

F -1 ν ) is the quantile function of µ (resp. ν). From this property, Rabin et al. (2011b); Bonnotte (2013) defined the SW distance as ∀µ, ν ∈ P p (R d ), SW p p (µ, ν) = S d-1

