SCALABLE AND EQUIVARIANT SPHERICAL CNNS BY DISCRETE-CONTINUOUS (DISCO) CONVOLUTIONS

Abstract

No existing spherical convolutional neural network (CNN) framework is both computationally scalable and rotationally equivariant. Continuous approaches capture rotational equivariance but are often prohibitively computationally demanding. Discrete approaches offer more favorable computational performance but at the cost of equivariance. We develop a hybrid discrete-continuous (DISCO) group convolution that is simultaneously equivariant and computationally scalable to high-resolution. While our framework can be applied to any compact group, we specialize to the sphere. Our DISCO spherical convolutions exhibit SO(3) rotational equivariance, where SO(n) is the special orthogonal group representing rotations in n-dimensions. When restricting rotations of the convolution to the quotient space SO(3)/SO(2) for further computational enhancements, we recover a form of asymptotic SO(3) rotational equivariance. Through a sparse tensor implementation we achieve linear scaling in number of pixels on the sphere for both computational cost and memory usage. For 4k spherical images we realize a saving of 10 9 in computational cost and 10 4 in memory usage when compared to the most efficient alternative equivariant spherical convolution. We apply the DISCO spherical CNN framework to a number of benchmark dense-prediction problems on the sphere, such as semantic segmentation and depth estimation, on all of which we achieve the state-of-the-art performance.

1. INTRODUCTION

Spherical data are prevalent across many fields, from the relic radiation of the Big Bang in cosmology, to 360 • imagery in virtual reality and computer vision. High-resolution data on the sphere are increasingly common in these and many other fields. Existing deep learning techniques for spherical data, however, cannot scale to high-resolution while also exhibiting equivariance, a powerful inductive bias responsible in part for the success of Euclidean convolutional neural networks (CNNs). Furthermore, many tasks involve dense predictions (e.g. semantic segmentation, depth estimation), necessitating pixel-wise outputs from deep learning models, exacerbating computational challenges. Continuous spherical CNNs approaches. Bronstein et al. (2021) present the categorization of geometric deep learning approaches. Deep learning on the sphere falls into the group category, since the sphere is a homogeneous space with global symmetries on which the group of 3D rotations SO(3) acts. In fact, the sphere is often considered as the prototypical example of the group setting. A number of group-based spherical CNN constructions have been developed (Cohen et al., 2018; Kondor et al., 2018; Esteves et al., 2018; 2020; Cobb et al., 2021; McEwen et al., 2022; Mitchel et al., 2022) , where Fourier representations of spherical signals (i.e. spherical harmonic representations), combined with sampling theorems on the sphere (Driscoll & Healy, 1994; McEwen & Wiaux, 2011) , are considered to provide access to the underlying continuous spherical signals and symmetries. While such approaches live natively on the sphere and capture rotational equivariance, they are highly computationally costly due to the need to compute spherical harmonic transforms (while fast spherical harmonic transforms exist they remain computationally demanding). McEwen et al. (2022) develop scattering networks on the sphere to alleviate computational considerations. While such an approach helps to scale to high-resolution input data, some high-resolution information is inevitably lost and architectures providing dense predictions are not supported. Discrete spherical CNNs approaches. Other approaches to spherical CNNs generally fall under the grid, graph or geodesic geometric deep learning categories yielding discrete approaches (Boomsma & Frellsen, 2017; Jiang et al., 2019; Zhang et al., 2019; Perraudin et al., 2019; Cohen et al., 2019) . These approaches offer more favorable computational performance than the group-based frameworks but at the cost of equivariance. Since a completely regular point distribution on the sphere does in general not exist, these discrete approaches lose the connection to the underlying continuous symmetries of the sphere and thus cannot fully capture rotational equivariance (although in some limited rotational equivariance may be achieved; e.g. Cohen et al. 2019). Our approach. Previous spherical CNN constructions can be categorised into two broad approaches: continuous approaches that capture rotational equivariance but are computationally demanding; and discrete approaches that do not fully capture rotationally equivariance but offer improved computational scaling. In this article we develop a novel hybrid discretecontinuous (DISCO) approach that is simultaneously computationally scalable to high-resolution, while also exhibiting excellent equivariance properties (see Figure 1 ). We define a DISCO group convolution, which we then specialize to the sphere. A transposed DISCO convolution can be considered to support dense-prediction tasks; hence, both high-resolution inputs and outputs are supported by our framework. DISCO convolutions afford a computationally scalable implementation through sparse tensor representations. We build DISCO spherical CNNs that achieve state-ofthe-art performance on a number of high-resolution dense prediction tasks, including semantic segmentation and depth estimation.

2. BACKGROUND 2.1 CONTINUOUS GROUP CONVOLUTION

Definition. To generalize CNNs to group geometric deep learning settings the group convolution can be considered (see, e.g., Cohen & Welling, 2016; Esteves, 2020) where the convolution (sometimes called correlation) of two functions f, ψ : G → R on the group G is given by (f ⋆ ψ)(g) = u∈G f (u)ψ(g -1 u)dµ(u), where g, u ∈ G and dµ(u) is the invariant Haar measure. In many cases signals are not defined on a space with a group structure. Often signals are defined on a quotient spacefoot_0 G/H, where H is a subgroup of G, i.e. u ∈ G/H and f, ψ : G/H → R. For Lie groups, if H is non-normal then G/H is not a group but simply a differentiable manifold on which G acts, i.e. a homogeneous space. In either setting the group convolution exhibits equivariance to group actions, that is (Qf ⋆ ψ)(g) = (Q(f ⋆ ψ))(g) , where Q denotes the group action corresponding to q ∈ G, i.e. (Qf )(g) = f (q -1 g). Fourier representation and sampling theory. The group convolution may be written in Fourier space by the product , e.g., Esteves, 2020) , where k is the Fourier conjugate variable of u, • denotes the Fourier transform, and • * denotes complex conjugation. For compact manifolds (groups) the Fourier space is discrete (the canonical example being the Fourier series of functions defined on the circle S 1 , i.e. periodic functions). Furthermore, for bandlimited signals on compact homogeneous manifolds, exact quadrature formulae exist (Pesenson & Geller, 2011) , from which the existence of a sampling theorem follows where all information content of a continuous bandlimited signal can be captured in a finite number of spatial samples. (f ⋆ ψ)(k) = f (k) ψ * (k) (see Computation. These results provide a strategy to compute the continuous group convolution exactly for bandlimited functions defined on compact homogeneous manifolds: (i) compute the finite



The quotient space G/H can be considered (in a non-rigorous way) as formed by collapsing all elements of H to the identity in G, in a sense factoring out the elements of H from G.



Figure 1: Spherical CNN categorization.

