SPATIO-TEMPORAL GRAPH SCATTERING TRANSFORM

Abstract

Although spatio-temporal graph neural networks have achieved great empirical success in handling multiple correlated time series, they may be impractical in some real-world scenarios due to a lack of sufficient high-quality training data. Furthermore, spatio-temporal graph neural networks lack theoretical interpretation. To address these issues, we put forth a novel mathematically designed framework to analyze spatio-temporal data. Our proposed spatio-temporal graph scattering transform (ST-GST) extends traditional scattering transforms to the spatiotemporal domain. It performs iterative applications of spatio-temporal graph wavelets and nonlinear activation functions, which can be viewed as a forward pass of spatio-temporal graph convolutional networks without training. Since all the filter coefficients in ST-GST are mathematically designed, it is promising for the real-world scenarios with limited training data, and also allows for a theoretical analysis, which shows that the proposed ST-GST is stable to small perturbations of input signals and structures. Finally, our experiments show that i) ST-GST outperforms spatio-temporal graph convolutional networks by an increase of 35% in accuracy for MSR Action3D dataset; ii) it is better and computationally more efficient to design the transform based on separable spatio-temporal graphs than the joint ones; and iii) the nonlinearity in ST-GST is critical to empirical performance.

1. INTRODUCTION

Processing and learning from spatio-temporal data have received increasing attention recently. Examples include: i) skeleton-based human action recognition based on a sequence of human poses (Liu et al. ( 2019)), which is critical to human behavior understanding (Borges et al. ( 2013)), and ii) multi-agent trajectory prediction (Hu et al. ( 2020)), which is critical to robotics and autonomous driving (Shalev-Shwartz et al. (2016) ). A common pattern across these applications is that data evolves in both spatial and temporal domains. This paper aims to analyze this type of data by developing novel spatio-temporal graph-based data modeling and operations. Spatio-temporal graph-based data modeling. Graphs are often used to model data where irregularly spaced samples are observed over time. Good spatio-temporal graphs can provide informative priors that reflect the internal relationships within data. For example, in skeleton-based human action recognition, we can model a sequence of 3D joint locations as data supported on skeleton graphs across time, which reflects both the human physical constraints and temporal consistency (Yan et al. (2018) ). Recent studies on modeling spatio-temporal graphs have followed either joint or separable processing frameworks. Joint processing is based on constructing a single spatio-temporal graph and processing (e.g., filtering) via operations on this graph (Kao et al. (2019) ; Liu et al. ( 2020)). In contrast, a separable processing approach works separately, and possibly with different operators, across the space and time dimension. In this case, independent graphs are used for space and

