CONCENTRIC SPHERICAL GNN FOR 3D REPRESENTATION LEARNING

Abstract

Learning 3D representations that generalize well to arbitrarily oriented inputs is a challenge of practical importance in applications varying from computer vision to physics and chemistry. We propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps, of which the single sphere representation is a special case. Our hierarchical architecture is based on alternatively learning to incorporate both intra-sphere and inter-sphere information. We show the applicability of our method for two different types of 3D inputs, mesh objects, which can be regularly sampled, and point clouds, which are irregularly distributed. We also propose an efficient mapping of point clouds to concentric spherical images using radial basis functions, thereby bridging spherical convolutions on grids with general point clouds. We demonstrate the effectiveness of our approach in achieving state-of-the-art performance on 3D classification tasks with rotated data.

1. INTRODUCTION

While convolutional neural networks have been applied to great success to 2D images, extending the same success to geometries in 3D has proven more challenging. A desirable property and challenge in this setting is to learn descriptive representations that are also equivariant to any 3D rotation. Cohen et al. (2018) and Esteves et al. (2018) showed that the spherical domain permits learning such rotationally equivariant representations, by defining convolutions with respect to spherical harmonics. In practice, 3D convolutions are implemented via discretization of the sphere. Earlier spherical Convolutional Neural Networks (CNNs) used spherical coordinate grids, but these discretizations result in non-uniform samplings of the sphere, which is non-ideal. Furthermore, spherical convolutions defined on these grids scale with O(N 1.5 ) complexity (N as the number of grid points). Subequent works, Jiang et al. (2019 ), Cohen et al. (2019 ), Defferrard et al. (2020) , designed more scalable O(N ) convolutions focusing on more uniform spherical discretizations. Existing spherical CNNs operate over a spherical image, resulting from projection of data to a bounding sphere. We show that it is more expressive and general to instead operate over a concentric, multi-spherical discretization for representing 3D data. Our main innovation is introducing a new two-phase convolutional scheme for learning over a concentric spheres representation, by alternating between inter-sphere and intra-sphere convolutional blocks. We use graph convolutions to incorporate inter-sphere information, and 1D convolutions to incorporate radial information. Similar to Jiang et al. (2019) and Cohen et al. (2019) , we focus on the icosahedral spherical discretization, which produces a mostly regular sampling over the sphere. Our proposed architecture is hierarchical, following the recursive coarsening hierarchy of the icosahedron. Combining intra-sphere and inter-sphere convolutions has a conceptual analogy to gradually incorporating information over volumetric sectors. At the same time, the choice of convolutions allows our model to retain a high degree of rotational equivariance. We demonstrate the effectiveness and generality of our approach through two 3D classification experiments with different types of input data: mesh objects and general point clouds. The latter poses an additional challenge for discretization-based methods, as native point clouds are non-uniformly distributed in 3D space. To summarize our contributions: 1. We propose a new multi-sphere icosahedral discretization for representation of 3D data, and show that incorporating the radial dimension can greatly enhance representation ability over single-sphere representations. 2. We also introduce a novel convolutional architecture for multi-sphere discretization by introducing two different types of convolutions, conceptually separated as intra-sphere and inter-sphere. Combining graph convolutions (intra-sphere) with 1D radial convolutions (inter-sphere) leads to an expressive architecture that is also rotationally equivariant. Our proposed convolutions are also scalable, being linear with respect to total grid size. 3. We design mappings of both 3D mesh objects and general point clouds to the proposed representation. We achieve state-of-art performance on ModelNet40 point cloud classification, using the proposed model and a data mapping using radial basis functions. We also improve on existing Spherical CNN performance in SHREC17 3D mesh classification by utilizing multi-radius information. 



Jiang et al. (2019)  proposed using parameterized differential operators to form convolutional kernels over the icosahedron, where equivariance is restricted to rotations about the z-axis.Cohen et al.  (2019)  proposed gauge equivariant convolutions on manifolds, operating on feature fields corresponding to underlying geometric entities. This was applied to achieve rotationally equivariant convolutions over the icosahedral discretization.Defferrard et al. (2020)  propose a graph convolutionbased spherical CNN using spectral filters, along with a distance-weighted nearest-neighbors graph construction scheme that allows balancing between rotational equivariance and efficiency, when applied to different types of grids. Other spherical CNNs have been designed in the context of handling arbitrary point cloud data, which typically requires first mapping the data to a discretization. Rao et al. (2019) uses graphconvolution inspired message passing operators for learning over the icosahedral discretization. Our work is similar to Rao et al. (2019) and Defferrard et al. (2020)) in terms of using graph-based spherical convolutions, but we generalize to multi-sphere convolutions. You et al. (2020) is the most related work in terms of multi-sphere representation learning. The authors propose a spherical voxel grid, and extending the SO3 convolutions of Cohen et al. (2018) to incorporate the radial dimension. Our work treats spherical and radial convolutions as distinct, which results in much better results in practice. We also use more scalable spherical convolutions defined on the uniform icosahedral grid. Pointwise Convolution Networks. There is a significant body of work on learning point cloud representations using pointwise convolutions, beginning with with Qi et al. (2017) which proposed learning permutation invariant functions that directly operate on point coordinates. Only more recently have such methods have been developed towards learning rotationally invariant representations. Thomas et al. (2018) and Poulenard et al. (2019) both propose pointwise convolutional filters based on spherical harmonic functions to achieve rotational equivariance (or invariance). Distance information is recorded through learned functions in the former, and radial sampling in the latter. While these filters are defined with respect to all-to-all convolution between points, in practice convolutions are limited to k-nearest neighbors (Poulenard et al. (2019)) for scalability. Chen et al. (2019), Sun et al. (2019), Zhang et al. (2019) all extract rotationally invariant features (i.e. low-level geometric features such as angles and distances) from the point cloud as input to their respective convolutional architectures. These features are hand-engineered based on carefully picking local frames of references, or global ones in the case of Sun et al. (2019).

Spherical CNNs. The goal of learning rotationally invariant representations of 3D geometries has led to several ideas for rotationally equivariant convolutions in the spherical domain. Cohen et al.

