SPHERICAL SLICED-WASSERSTEIN

Abstract

Many variants of the Wasserstein distance have been introduced to reduce its original computational burden. In particular the Sliced-Wasserstein distance (SW), which leverages one-dimensional projections for which a closed-form solution of the Wasserstein distance is available, has received a lot of interest. Yet, it is restricted to data living in Euclidean spaces, while the Wasserstein distance has been studied and used recently on manifolds. We focus more specifically on the sphere, for which we define a novel SW discrepancy, which we call spherical Sliced-Wasserstein, making a first step towards defining SW discrepancies on manifolds. Our construction is notably based on closed-form solutions of the Wasserstein distance on the circle, together with a new spherical Radon transform. Along with efficient algorithms and the corresponding implementations, we illustrate its properties in several machine learning use cases where spherical representations of data are at stake: sampling on the sphere, density estimation on real earth data or hyperspherical auto-encoders.

1. INTRODUCTION

Optimal transport (OT) (Villani, 2009) has received a lot of attention in machine learning in the past few years. As it allows to compare distributions with metrics, it has been used for different tasks such as domain adaptation (Courty et al., 2016) or generative models (Arjovsky et al., 2017) , to name a few. The most classical distance used in OT is the Wasserstein distance. However, calculating it can be computationally expensive. Hence, several variants were proposed to alleviate the computational burden, such as the entropic regularization (Cuturi, 2013; Scetbon et al., 2021) , minibatch OT (Fatras et al., 2020) or the sliced-Wasserstein distance (SW) for distributions supported on Euclidean spaces (Rabin et al., 2011b) . Although embedded in larger dimensional Euclidean spaces, data generally lie in practice on manifolds (Fefferman et al., 2016) . A simple manifold, but with lots of practical applications, is the hypersphere S d-1 . Several types of data are by essence spherical: a good example is found in directional data (Mardia et al., 2000; Pewsey & García-Portugués, 2021) for which dedicated machine learning solutions are being developed (Sra, 2018), but other applications concern for instance geophysical data (Di Marzio et al., 2014) , meteorology (Besombes et al., 2021) , cosmology (Perraudin et al., 2019) or extreme value theory for the estimation of spectral measures (Guillou et al., 2015) . Remarkably, in a more abstract setting, considering hyperspherical latent representations of data is becoming more and more common (e.g. (Liu et al., 2017; Xu & Durrett, 2018; Davidson et al., 2018) ). For example, in the context of variational autoencoders (Kingma & Welling, 2013) , using priors on the sphere has been demonstrated to be beneficial (Davidson et al., 2018) . Also, in the context of self-supervised learning (SSL), where one wants to learn discriminative representations in an unsupervised way, the hypersphere is usually considered for the latent representation (Wu et al., 2018; Chen et al., 2020a; Wang & Isola, 2020; Grill et al., 2020; Caron et al., 2020) . It is thus of primary importance to develop machine learning tools that accommodate well with this specific geometry. The OT theory on manifolds is well developed (Villani, 2009; Figalli & Villani, 2011; McCann, 2001) and several works started to use it in practice, with a focus mainly on the approximation of OT maps. 



For example, Cohen et al. (2021); Rezende & Racanière (2021) approximate the OT map to define normalizing flows on Riemannian manifolds, Hamfeldt & Turnquist (2021a;b); Cui et al. (2019) derive algorithms to approximate the OT map on the sphere, Alvarez-Melis et al. (2020); Hoyos-

