EFFICIENT LONG-RANGE CONVOLUTIONS FOR POINT CLOUDS

Abstract

The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that directly incorporates long-range information for a point cloud. This layer, dubbed the long-range convolutional (LRC)-layer, leverages the convolutional theorem coupled with the non-uniform Fourier transform. In a nutshell, the LRC-layer mollifies the point cloud to an adequately sized regular grid, computes its Fourier transform, multiplies the result by a set of trainable Fourier multipliers, computes the inverse Fourier transform, and finally interpolates the result back to the point cloud. The resulting global all-to-all convolution operation can be performed in nearly-linear time asymptotically with respect to the number of input points. The LRC-layer is a particularly powerful tool when combined with local convolution as together they offer efficient and seamless treatment of both short and long range interactions. We showcase this framework by introducing a neural network architecture that combines LRC-layers with short-range convolutional layers to accurately learn the energy and force associated with a N -body potential. We also exploit the induced two-level decomposition and propose an efficient strategy to train the combined architecture with a reduced number of samples.

1. INTRODUCTION

Point-cloud representations provide detailed information of objects and environments. The development of novel acquisition techniques, such as laser scanning, digital photogrammetry, light detection and ranging (LIDAR), 3D scanners, structure-from-motion (SFM), among others, has increased the interest of using point cloud representation in various applications such as digital preservation, surveying, autonomous driving (Chen et al., 2017 ), 3D gaming, robotics (Oh & Watanabe, 2002) , and virtual reality (Park et al., 2008) . In return, this new interest has fueled the development of machine learning frameworks that use point clouds as input. Historically, early methods used a preprocessing stage that extracted meticulously hand-crafted features from the point cloud, which were subsequently fed to a neural network (Chen et al., 2003; Rusu et al., 2008; Rusu et al., 2009; Aubry et al., 2011) , or they relied on voxelization of the geometry (Savva et al., 2016; Wu et al., 2015; Riegler et al., 2017; Maturana & Scherer, 2015) . The PointNet architecture (Qi et al., 2017) was the first to handle raw point cloud data directly and learn features on the fly. This work has spawned several related approaches, aiming to attenuate drawbacks from the original methodology, such as PointNet++ (Qi et al., 2017) , or to increase the accuracy and range of application (Wang et al., 2019; Zhai et al., 2020; Li et al., 2018; Liu et al., 2019) . Even though such methods have been quite successful for machine learning problems, they rely on an assumption of locality, which may produce large errors when the underlying task at hand exhibits long-range interactions (LRIs). To capture such interactions using standard convolutional layers, one can use wider window sizes, deeper networks, and/or a large number of features, which may increase the computational cost significantly. Several approaches have been proposed to efficiently capture such interactions in tasks such as semantic segmentation, of which the ideas we briefly summarize below. In the multi-scale type of approaches, features are progressively processed and merged. Within this family, there exist several variants, where the underlying neural networks can

