DEEP CONVOLUTION FOR IRREGULARLY SAMPLED TEMPORAL POINT CLOUDS Anonymous

Abstract

We consider the problem of modeling the dynamics of continuous spatial-temporal processes represented by irregular samples through both space and time. Such processes occur in sensor networks, citizen science, multi-robot systems, and many others. We propose a new deep model that is able to directly learn and predict over this irregularly sampled data, without voxelization, by leveraging a recent convolutional architecture for static point clouds. The model also easily incorporates the notion of multiple entities in the process. In particular, the model can flexibly answer prediction queries about arbitrary space-time points for different entities regardless of the distribution of the training or test-time data. We present experiments on real-world weather station data and battles between large armies in StarCraft II. The results demonstrate the model's flexibility in answering a variety of query types and demonstrate improved performance and efficiency compared to state-of-the-art baselines.

1. INTRODUCTION

Many real-world problems feature observations that are sparse and irregularly sampled in both space and time. Weather stations scattered across the landscape reporting at variable rates without synchronization; citizen-science applications producing observations at the whim of individuals; or even opportunistic reports of unit positions in search-and-rescue or military operations. These sparse and irregular observations naturally map to a set of discrete space-time points -forming a spatiotemporal point cloud representing the underlying process. Critically, the dynamics of these points are often highly related to the other points in their spatio-temporal neighborhood. Modelling spatio-temporal point clouds is difficult with standard deep networks which assume observations are dense and regular -at every grid location in CNNs, every time step in RNNs, or both for spatio-temporal models like Convolutional LSTMs (Xingjian et al., 2015) . While there has been work examining irregularly sampled data through time (Rubanova et al., 2019; Shukla & Marlin, 2018) and in space (Wu et al., 2019) , modeling both simultaneously has received little attention (Choy et al., 2019) . This is due in part to the difficulty of scaling prior solutions across both space and time. For instance, voxelization followed by sparse convolution (Choy et al., 2019) or dense imputation (Shukla & Marlin, 2018) now face a multiplicative increase in the number of cells. Rather than forcing irregular data into dense representations, an emerging line of research treats spatial point-clouds as first-class citizens (Qi et al., 2017a; b; Su et al., 2018; Xu et al., 2018) . Several works directly extend 2D convolutions to point clouds (Simonovsky & Komodakis, 2017; Wang et al., 2019; Hermosilla et al., 2018 ), with (Wu et al., 2019) being the first that allows efficient exact computation of convolution with dozens of layers. In this work, we build on this line of research to model spatio-temporal point clouds. Specifically, we extend the work of Wu et al. ( 2019) with an additional module to reason about point representations through time. Our new model, TemporalPointConv (TPC), is a simple but powerful extension that can learn from an arbitrary number of space-time points. Each layer in TemporalPointConv updates the representation of each point by applying two operators in sequence -one that considers the spatial neighborhood in a narrow temporal window and another that models how this spatial representation changes over time. By factorizing the representation update into separate spatial and temporal operators, we gain significant modeling flexibility. Further, by operating directly on point clouds, we can predict observations at arbitrary space-time, regardless of the distribution of observations. We demonstrate TemporalPointConv on two distinct problems: 1) predicting future states of a custom Starcraft II environment involving battles between variable-sized groups, and 2) predicting the weather at stations distributed throughout the state of Oklahoma. Further, we show the utility of these networks in identifying damaged or anomalous weather sensors after being trained exclusively on the associated prediction problem. The results show that TemporalPointConv outperforms both state of the art set functions and a discrete sparse convolution algorithm in terms of raw performance, ability to detect anomalies, and generalization to previously unseen input and query distributions. Xingjian et al. (2015) gives an early approach to spatio-temporal modeling via convolution by incorporating a standard convolutional structure into the latent memory of an LSTM. This approach is appropriate for situations where the data is regularly sampled in both space and time, which is different from our setting. Interaction networks (Battaglia et al., 2016) and related approaches allow for modeling sets of interacting objects or points over time, with an original motivation to model physics processes. These models are more flexible in their modeling of spatial relationships among points. However, there is an assumption of uniform temporal sampling, which is violated in our setting.

2. RELATED WORK

A significant amount of work on spatio-temporal modeling for non-uniform spatial sampling uses Graph Convolutional Networks (GCNs) for modeling spatial interactions. For example, Li et al. (2018b) used a GCN followed by an RNN and Yu et al. (2018) used GCNs for spatial correlation and temporal convolution for temporal correlations. They require sampling at continuous temporal intervals and did not deal with generalization outside the fixed given graph. Rather, our approach generalizes to any spatio-temporal point outside of the training data. Yao et al. ( 2019) introduces an attention model to deal with dynamic spatial relationships, however this is only possible for the dense CNN version in their paper, whereas their version with irregular spatial sampling utilizes the GCN and shares the same issues with the above GCN approaches. PointNet (Qi et al., 2017a) sparked significant interest in networks for 3D Point cloud processing. A number of networks have been proposed (Qi et al., 2017a; b; Su et al., 2018; Xu et al., 2018) with the highest performing using either sparse convolutional networks (Graham & van der Maaten, 2018; Choy et al., 2019) or point convolutional networks (Wu et al., 2019; Thomas et al., 2019) . Set networks, such as DeepSets (Zaheer et al., 2017b) , are similar to PointNet (Qi et al., 2017a) with neither explicitly considering neighborhood information of elements/points, making them less powerful than convolutional methods. Recently, Horn et al. (2020) proposed a set network approach for non-uniform time-series prediction, which encodes time into the feature vector of points. Our experiments show that this approach is outperformed by our convolutional method. Sparse convolutional networks are similar to dense volumetric convolutional networks that use a regular grid to discretize space-time, but they are only computed at locations with occupied points. Minkowski networks (Choy et al., 2019) is a sparse convolutional network that models spatio-



Figure 1: TemporalPointConv operates on unsynchronized sets of spatio-temporal samples by applying two point-based convolutional operators in sequence, each of which exploits separate notions of either spatial or temporal locality.

